CodedotAl / gpt-code-clippy

Full description can be found here: https://discuss.huggingface.co/t/pretrain-gpt-neo-for-open-source-github-copilot-model/7678?u=ncoop57
Apache License 2.0
3.3k stars 224 forks source link

Creating embeddings instead of output prediction #86

Open JorritWillaert opened 2 years ago

JorritWillaert commented 2 years ago

Hi! I was wondering if I a GPT Code Clippy model could generate embeddings instead of output generation? The purpose is to embed code in a semantical space, such that it can be used as a feature for another neural network. I have done the same with BERT (more as a baseline, since this model is not trained on code), and with the OpenAI Codex model (with a paying API), and therefore would love to use one of your models as well.

Thank you!

ncoop57 commented 2 years ago

Hi @JorritWillaert !

So you could use our models for embedding. However, I would not recommend it. I'd suggest checking out graphcodebert from Microsoft for doing this. It gets good performance across a ton of other code related tasks: https://huggingface.co/microsoft/graphcodebert-base