bigcode-project / starcoder

Home of StarCoder: fine-tuning & inference!
Apache License 2.0
7.23k stars 512 forks source link

Generating Embeddings of Code Tokens using StarCoder #141

Open code2graph opened 11 months ago

code2graph commented 11 months ago

I am exploring the possibility of using StarCoder to generate embeddings for code tokens and would like to know if this is feasible with the current implementation.

Questions:

  1. Is it possible to use StarCoder to generate embeddings of code tokens?
  2. If yes, how should we configure and use StarCoder to make it usable for generating embeddings of code tokens?
loubnabnl commented 9 months ago

Hi, you can take the last hidden layer of the model as embeddings, however it might be better to use an encoder for the embeddings, we have trained a BERT-like code model called StarEncoder which you can try https://huggingface.co/bigcode/starencoder