bigcode-project / starcoder

Home of StarCoder: fine-tuning & inference!
Apache License 2.0
7.33k stars 522 forks source link

Generating Embeddings of Code Tokens using StarCoder #141

Open code2graph opened 1 year ago

code2graph commented 1 year ago

I am exploring the possibility of using StarCoder to generate embeddings for code tokens and would like to know if this is feasible with the current implementation.

Questions:

  1. Is it possible to use StarCoder to generate embeddings of code tokens?
  2. If yes, how should we configure and use StarCoder to make it usable for generating embeddings of code tokens?
loubnabnl commented 1 year ago

Hi, you can take the last hidden layer of the model as embeddings, however it might be better to use an encoder for the embeddings, we have trained a BERT-like code model called StarEncoder which you can try https://huggingface.co/bigcode/starencoder