Question on how to reproduce the result

brunnurs / valuenet

ValueNet: A Neural Text-to-SQL Architecture Incorporating Values

Apache License 2.0

66 stars 22 forks source link

Question on how to reproduce the result #2

Open QuchenFu opened 4 years ago

QuchenFu commented 4 years ago

Hello,

I am trying to reproduce the result using the parameters specified in your paper. However, I don't have as much memory in my GPU, so I set the batch size to 10 instead of 20. I can achieve only 45% accuracy so far after about 40 epochs, I tried multiple time. I tried to set 2 of the learning rates to half the original size with no effect. Can you give me some advice on how I can reproduce your result(fine-tune the model) with a batch size of 10? Thank you so much.

Best

QuchenFu commented 4 years ago

Also, I noticed the accuracy will drop close to 0 after about 10 epochs with a batch size of 1.

brunnurs commented 4 years ago

@QuchenFu try which metric are you referring to with 45%? The official Spider metric is the one I report with "all" (so all difficulty levels). In general I don't think the batch size will have such a large impact, as I often also trained with a batch size of 8 or 16. See the screenshots attached for exact configuration of one of my best run. Screenshot from 2020-06-22 09-52-15

QuchenFu commented 4 years ago

Thank you so much for clarify this! I thought "accuracy" in the plot refers to the accuracy of all levels. If "all" is the "all difficulty level", what does the "accuracy" metric refer to(down in the middle of the plot)? @brunnurs

brunnurs commented 4 years ago

"accuracy" is referring to the accuracy on SemQL level. We compare the predicted SemQL with the expected SemQL ground truth. Only afterwards we transform the predicted SemQL to sql and execute it with the spider evaluation scripts. The difference between "accuracy" and "all" stems from the fact that the SemQL-to-SQL transformation is not perfect, but then sometimes a query also works if the SemQL is not exactly a 1:1 match to the ground truth.

For more information about the "accuracy" metric have a look at the code starting here: https://github.com/brunnurs/valuenet/blob/28c8a65d0359147bd21415026fe5392170fd3a57/src/evaluation.py#L50