ZurichNLP / ContraPro

Contrastive evaluation of pronoun translation in neural machine translation
MIT License
24 stars 5 forks source link

About contrapro.json #4

Closed sinhngn closed 3 years ago

sinhngn commented 3 years ago

Please, tell me understand contrapro.json, how to generate this file? Because i refer use your tool to evaluation for another language (ex: en-vi) Thank you!

a-rios commented 3 years ago

Hi, you can just download the .json file, it's part of the repo? Or do you mean how to create one for another language pair?

sinhngn commented 3 years ago

I mean, i want to create another language pair. contrapro.json in repo just for en-de. Can I create contrapro.json for en-vi? Thank you!

bricksdont commented 3 years ago

The code we used to extract those examples is currently not available because our Gitlab servers were discontinued. As soon as I manage to upload the code again I will notify you in this thread.

You can in principle then adapt this code for EN-VI, but important: our extraction process relies on many NLP tools such as tagging, parsing and coreference resolution. If such tools are not available for Vietnamese then our approach does not work.

a-rios commented 3 years ago

If you have a parallel corpus, you need to parse both sides and do co-reference resolution - then, via word alignment, you can extract sentence pairs that have corresponding pronouns that you can then replace to create the contrastive sentences. The tricky part is probably to get a good co-reference resolution, this seems to be something that isn't available for many languages.

sinhngn commented 3 years ago

Thank you!, let me try. VNCoreNLP Tool can create Post taging, but can not detect co-reference resolution :(

bricksdont commented 3 years ago

Hi @sinhngn,

The code to extract contrapro is up again, here:

https://gitlab.ifi.uzh.ch/mmueller/pronoun-sets

There is not much documentation unfortunately. One file you may be interested in that shows most of the extraction logic is

https://gitlab.ifi.uzh.ch/mmueller/pronoun-sets/-/blob/master/extract.py

Let me know if this is useful and regards Mathias

bricksdont commented 3 years ago

@sinhngn Does the above answer your question, can this issue be closed?

sinhngn commented 3 years ago

Yes, thank you very much!