CodedotAl / gpt-code-clippy

Full description can be found here: https://discuss.huggingface.co/t/pretrain-gpt-neo-for-open-source-github-copilot-model/7678?u=ncoop57
Apache License 2.0
3.29k stars 220 forks source link

Low Pass@k #68

Closed Naman-ntc closed 2 years ago

Naman-ntc commented 2 years ago

Hi, Thanks for the great work! Firstly I wanted to ask about the performance of the code-clippy models. It seems that the 125M parameter models are quite weak and perform quite poorly on human-eval dataset (even lower than GPT-Neo-1.3B?). Any idea why this is happening.

Also is there some update on the evaluation of the GPT-Neo-1.3 B code-clippy model?

Finally, I would love to contribute to upcoming iterations of code-clippy. Should I join the discord channel?

reshinthadithyan commented 2 years ago

Hello, can you exactly point out which model you've been using? Please read https://github.com/CodedotAl/gpt-code-clippy/issues/61#issuecomment-899005138 for more information on the right model to be used. If anything please feel free to mention it here. Thanks for your interest to contribute. Please join the discord server, we'll post the onboarding document soon. Thanks, good day!

Naman-ntc commented 2 years ago

Hi, Sorry, I was just looking at the models wiki page where evaluation on Human-Eval dataset is provided :

image

Why is pass@k is 0 for all code-clippy models?

ncoop57 commented 2 years ago

We aren't exactly sure to be honest. I initial thought is that our training script was not optimized (the model would learn a little and then immediately stop learning). We are actively working to improve our model training both by scaling up models as well as cleaning our data more. Closing this issue for now. If you'd like to help us improve, we are going to be creating a CONTRIBUTION.md guide for new contributors so that we can onboard new volunteers easier. Join our discord if you want to stay up to date with that news!