Evaluation BLEU script - Githubissues

amritasaha1812 / MMD_Code

34 stars 11 forks source link

Evaluation BLEU script #3

Open shubhamagarwal92 opened 6 years ago

shubhamagarwal92 commented 6 years ago

Thanks for providing the code.

Could you provide the BLEU script that you use for benchmark results in your paper. I am not able to reproduce the BLEU score of 56.67 (for Multimodal HRED (2) that you quote in Table 6) with your code and dataset.

I tried with multi-bleu.perl provided by OpenNMT and received a score of 37.67 compared to the value of 56.

khoaipx commented 6 years ago

@shubhamagarwal92 Can you tell me the version of tensorflow you use to run code of this repository? I try to ask the author, but she doesn't reply to me.

shubhamagarwal92 commented 6 years ago

Hi @khoaipx

Tensorflow 0.12.0 will work for you to run this code.

You can try this requirements file which I made:

requirements_mmd.txt

Also, I have a pytorch version of the code which I would be releasing soon.

khoaipx commented 6 years ago

@shubhamagarwal92 thank you so much. I could run this code. And I wait your pytorch version. I'm doing research in this domain. Could you give me your contact to be convenient to discuss?

khoaipx commented 5 years ago

@shubhamagarwal92 I think the reason for that difference is the context size. For you, the context size of 2 means using 2 previous utterances as context (according to your paper). But, in this code (and their paper), the context size of 2 is using 2 previous turns as context. For each turn, there are 2 utterances included one of the user, one of the system. Besides, I have waited for your pytorch version.

shubhamagarwal92 commented 5 years ago

@khoaipx Pardon for the delay. Been involved with a lot of different things. PFA the pytorch version of our code. You can raise any issue on the repo to be in further contact (or mail at sa201@hw.ac.uk).

As for your previous comment, we do not differentiate between the term utterance and turn and IMO neither do they. Follow the previous 2 issues.