Closed dharakyu closed 3 years ago
Hi Dhara! Apologies for the late response. Could you share a sample of the data that was generated using prepare_dataset.py ?
Hi Mihir,
Thanks for your response! I've attached a sample of the training data. We reformatted the data for our model pipeline, such that the SYSTEM entry was the input and "utterance" was the output. I think our BLEU scores were about 3 points higher than you reported, so we were wondering if that could have possible arisen from the actual calculation of the BLEU scores?
Thanks, Dhara
Thanks for sharing the data Dhara! The data itself looks okay to me. We reported BLEU scores generated by the T5 framework itself when an experiment is run, as specified in t5_tasks.py The code for the same can be found here. The script you are using also uses corpus_bleu from sacrebleu, but seems like T5 is passing some specific flags that your script is not. Maybe that is accounting for the difference?
That was it. Thanks Mihir!
Hello,
My group is attempting to replicate your experimental results but have been getting significantly different BLEU scores than reported in your paper. Our steps were as follows:
When evaluated on the T2G2 test dataset, we recorded a higher BLEU score that was recorded in your paper. To diagnose why were getting such a high score, we ran the copy experiment as described in Table 4 of the paper (computing BLEU score between trivial input and gold standard) with the exact same parameters as described in the T5 repository. Here is the script we used to evaluate on the T2G2 test dataset. We recorded a BLEU score of 23.1 (compared to 18.8 in the paper).
We are wondering if you have an idea why there would be a discrepancy between these numbers we were getting and what you reported. Is there a particular way you are formatting the data that could account for the difference? I have been examining the T5 codebase but so far have been unable to find something significant enough in the implementation that might account for this delta.
Thank you and looking forward to hearing your thoughts.
Best, Dhara