Closed ywen666 closed 3 years ago
Hi Yeming,
Thank you for your interest in the Code Transformer.
Which experiment are you trying to reproduce? The code_transformer/experiments/code_transformer/code_summarization.yaml
is just a sample file to show the hyperparameters.
If you didn't change anything there, I think it just trains on the Python subset ( filter_language: python
) of the multi-language dataset ( language: 'python,javascript,ruby,go'
).
If you get 0.33 micro-F1 then that is actually pretty close to the 34.97 we reported in Table 2 / Python / Ours / F1. So that makes sense to me.
You should be able to reproduce the numbers using the hyperparemeter files in code_transformer/experiments/paper
We put the hyperparameters for all experiments that we report in the paper there. You can find an overview of these in the README under section 5.
Regarding the number of training steps: This of course depends on the dataset you are using. But usually, the models had the best validation performance when they had seen around 1.5 - 2 million samples (for smaller datasets, Ruby, JavaScript) or up to 4 million samples (for bigger datasets, multi-language, java-small). As we were using gradient accumulation with 128 samples, this would correspond to 150k or 300k gradient updates.
Hope this helps.
Oh I see, there is a filter language option in the yaml file. Thanks for the detailed explanation!
Thanks for releasing this amazing repo! The documentation is thorough and extremely helpful!
I didn't find the number of training steps or epochs needed in Appendix A.6 in the paper. I am running python -m scripts.run-experiment code_transformer/experiments/code_transformer/code_summarization.yaml (I changed the #layers in yaml file from 1 to 3 according to the appendix in the paper) over 2 days on a single GPU.
I have run for 600k steps and F-1 score in the tensorboard (I guess this is average F-1 score over 4 coding languages?) is around 0.27 (the micro F-1 is 0.33). The number is still a bit off from table 2. I wonder should I just train longer or something is wrong with my training.