bigcode-project / octopack

🐙 OctoPack: Instruction Tuning Code Large Language Models
https://arxiv.org/abs/2308.07124
MIT License
431 stars 27 forks source link

About code explanation #15

Open enmengyi opened 1 year ago

enmengyi commented 1 year ago

I found the original data format of Commitpackft is like this: image

I don't really understand how to use it to finetune my model on code-explanation task, because it seems that there is no information about what this piece of code is doing.

Muennighoff commented 1 year ago

You can finetune it to predict the commit subject which usually explains what the change is doing but not what the entire code is doing.

To get data that explains what the entire code is doing you could filter for commits where old_contents is empty. Then you may get commit subjects that explain the entire new_contents. We haven't tried this though, but I'd love to know how well it works.

enmengyi commented 1 year ago

You can finetune it to predict the commit subject which usually explains what the change is doing but not what the entire code is doing.

To get data that explains what the entire code is doing you could filter for commits where old_contents is empty. Then you may get commit subjects that explain the entire new_contents. We haven't tried this though, but I'd love to know how well it works.

Thank you so much!