CoderPat / structured-neural-summarization

A repository with the code for the paper with the same title
MIT License
74 stars 26 forks source link

Hyperparameter tuning #17

Closed ioana-blue closed 5 years ago

ioana-blue commented 5 years ago

Unfortunately the paper doesn't talk at all about the hyperparameter tuning. I was hoping you could share some of your experience. In particular, I'm interested to hear what parameters were worth tuning.

I'm in the process of using the code for a python code dataset that is an order of magnitude smaller than the datasets reported in the paper. The performance that I get so far is about half of what you got (but I'm not done tuning yet). So far it seems that smaller learning rates work better.

Anything else that you played with and it proved beneficial? Thank you!

ioana-blue commented 5 years ago

One parameter that I couldn't find in the paper is the number of layers in the GGNN. What values did you use?

CoderPat commented 5 years ago

Apologize for the delay. The performance of the model in code datasets is highly dependent on the quality of the dataset. As reported in the paper (and in the technical report by one of the co-authors), if there is duplication in the dataset this will impact performance, so without knowing the dataset it is hard to know. However I can tell you that GGNN steps used for code was [2, 2, 2, 2] since the strutucture is more relevant in code than in natural language.

ioana-blue commented 5 years ago

I'm fairly confident there is no duplication in the dataset by construction. I'll keep you posted with what I find. Shall I understand that there were 4 layers and the number of steps were 2? I'm not sure what [2, 2, 2, 2] means.

CoderPat commented 5 years ago

It means 4 layers with two timesteps each.

ioana-blue commented 5 years ago

Makes sense. (your portuguese notifications look cool :) )