CodedotAl / gpt-code-clippy

Full description can be found here: https://discuss.huggingface.co/t/pretrain-gpt-neo-for-open-source-github-copilot-model/7678?u=ncoop57
Apache License 2.0
3.29k stars 220 forks source link

EleutherAI/gpt-neo-1.3B Model works better than this. #61

Closed bubundas17 closed 2 years ago

bubundas17 commented 3 years ago

Hi, You guys are doing a great job with it.

I have tried your flax-community/gpt-neo-1.3B-apps-all model, and the generated code is kinda hit or miss.

This is generated using flax-community/gpt-neo-1.3B-apps-all image

and this is generated using EleutherAI/gpt-neo-1.3B image

as far I know EleutherAI/gpt-neo-1.3B is trained on more generalized texts, which are not necessarily code.

then why flax-community/gpt-neo-1.3B-apps-all performing much worse than EleutherAI/gpt-neo-1.3B?

reshinthadithyan commented 3 years ago

Hello. Thanks for the interest in our work. flax-community/gpt-neo-1.3B-apps-all is not the right model you are looking for. The above used model is a fine-tuned version of GPT-Neo on APPS Dataset. It is a competitive programming style code dataset for evaluation. We did that for making the demo for the event. Although majority of our work evolved over scrapping code from GitHub. Please try using gpt-code-clippy-125M-1024-f, which was trained with Causal Objective on our dataset we scrapped from GitHub. Thanks again for pointing it out. We'll update our README.md with proper redirection to appropriate models.

Trial

image

Feel free to reproduce the experiment over here - https://colab.research.google.com/drive/1SEvl7xR48FdDdn75cbS9FiRF6Gd0QgXg?usp=sharing

bubundas17 commented 3 years ago

Yes, you are right. I got it from the demo app. flax-community/gpt-neo-1.3B-apps-all This dataset was commented out in the demo app source code.

Is there any 1.3B version of gpt-code-clippy-125M-1024-f? I'd like to try out that too.

And what are the future plans with this project? Will it just stay as a research paper? Or you guys are planning to publish a competitive product like GitHub copilot?

Some thoughts: Running the 1.3B version at fast enough speeds requires a good amount of processing power. Will it be viable to create GitHub copilot like service using this dataset?

reshinthadithyan commented 3 years ago
ncoop57 commented 3 years ago

@bubundas17 another thing you could try if you want to use the finetuned 1.3B model is to modify your prompt to the model to be more inline with its training data. For your example you could try using this helper function which we use in our demo to format the code correctly:

def format_input(question, starter_code=""):
    answer_type = (
        "\nUse Call-Based format\n" if starter_code else "\nUse Standard Input format\n"
    )
    return f"\nQUESTION:\n{question}\n{starter_code}\n{answer_type}\nANSWER:\n"

where the question parameter is your doc string and the starter_code is the start of your method definition

bubundas17 commented 3 years ago

@bubundas17 another thing you could try if you want to use the finetuned 1.3B model is to modify your prompt to the model to be more inline with its training data. For your example you could try using this helper function which we use in our demo to format the code correctly:

def format_input(question, starter_code=""):
    answer_type = (
        "\nUse Call-Based format\n" if starter_code else "\nUse Standard Input format\n"
    )
    return f"\nQUESTION:\n{question}\n{starter_code}\n{answer_type}\nANSWER:\n"

where the question parameter is your doc string and the starter_code is the start of your method definition

Yes, I was already using this function. I picked up the code from the web demo.

ncoop57 commented 3 years ago

Ah okay I think I understand the reason now it was generating such nonsense for you. The APPS model was trained purely on python data and so trying to feed it the javascript code you have will cause it to behave strangely. In that case definitely try just using our 125M model or stick with EleutherAI's 1.3B for now until we get around to finetuning one of that size on pure github data. You can also try out EleutherAI's GPT J that has 6B parameters: https://github.com/kingoflolz/mesh-transformer-jax#gpt-j-6b. It does an even better job at code generation

bubundas17 commented 3 years ago

Ah okay I think I understand the reason now it was generating such nonsense for you. The APPS model was trained purely on python data and so trying to feed it the javascript code you have will cause it to behave strangely. In that case definitely try just using our 125M model or stick with EleutherAI's 1.3B for now until we get around to finetuning one of that size on pure github data. You can also try out EleutherAI's GPT J that has 6B parameters: https://github.com/kingoflolz/mesh-transformer-jax#gpt-j-6b. It does an even better job at code generation

Ahh 😅😅

Guys, another thing I am curious about. If you guys only train the model purely with GitHub Data, I guess it won't have much understanding in English language.

Then how will it understand the context? (I.E the commented text above function)

reshinthadithyan commented 3 years ago

Ah okay I think I understand the reason now it was generating such nonsense for you. The APPS model was trained purely on python data and so trying to feed it the javascript code you have will cause it to behave strangely. In that case definitely try just using our 125M model or stick with EleutherAI's 1.3B for now until we get around to finetuning one of that size on pure github data. You can also try out EleutherAI's GPT J that has 6B parameters: https://github.com/kingoflolz/mesh-transformer-jax#gpt-j-6b. It does an even better job at code generation

Ahh 😅😅

Guys, another thing I am curious about. If you guys only train the model purely with GitHub Data, I guess it won't have much understanding in English language.

Then how will it understand the context? (I.E the commented text above function) I guess the answer lies in the nature of the Modelling. 1) The base model used is GPT-Neo which has seen a lot of NL text alongside some code refer PILE Dataset. 2) While the Dataset is GitHub Code, it has also seen the comments alongside the code. That's why a clean prompt-design which is coherent with the training data gives a better outcome. 3) The probability of having good comments with the criteria of filtering in GitHub is high. One such choice is that we filtered repository with high stars, hypothesis being " popular repositories have high quality comments". I hope this answers your question. Thanks.

ncoop57 commented 2 years ago

Closing this issue for now, if you'd like to discuss this more feel free to reopen, but a better form for in-depth discussion would be on our discord!