Closed ioana-blue closed 5 years ago
One parameter that I couldn't find in the paper is the number of layers in the GGNN. What values did you use?
Apologize for the delay. The performance of the model in code datasets is highly dependent on the quality of the dataset. As reported in the paper (and in the technical report by one of the co-authors), if there is duplication in the dataset this will impact performance, so without knowing the dataset it is hard to know. However I can tell you that GGNN steps used for code was [2, 2, 2, 2] since the strutucture is more relevant in code than in natural language.
I'm fairly confident there is no duplication in the dataset by construction. I'll keep you posted with what I find. Shall I understand that there were 4 layers and the number of steps were 2? I'm not sure what [2, 2, 2, 2]
means.
It means 4 layers with two timesteps each.
Makes sense. (your portuguese notifications look cool :) )
Unfortunately the paper doesn't talk at all about the hyperparameter tuning. I was hoping you could share some of your experience. In particular, I'm interested to hear what parameters were worth tuning.
I'm in the process of using the code for a python code dataset that is an order of magnitude smaller than the datasets reported in the paper. The performance that I get so far is about half of what you got (but I'm not done tuning yet). So far it seems that smaller learning rates work better.
Anything else that you played with and it proved beneficial? Thank you!