Closed sinhngn closed 3 years ago
Hi, you can just download the .json file, it's part of the repo? Or do you mean how to create one for another language pair?
I mean, i want to create another language pair. contrapro.json in repo just for en-de. Can I create contrapro.json for en-vi? Thank you!
The code we used to extract those examples is currently not available because our Gitlab servers were discontinued. As soon as I manage to upload the code again I will notify you in this thread.
You can in principle then adapt this code for EN-VI, but important: our extraction process relies on many NLP tools such as tagging, parsing and coreference resolution. If such tools are not available for Vietnamese then our approach does not work.
If you have a parallel corpus, you need to parse both sides and do co-reference resolution - then, via word alignment, you can extract sentence pairs that have corresponding pronouns that you can then replace to create the contrastive sentences. The tricky part is probably to get a good co-reference resolution, this seems to be something that isn't available for many languages.
Thank you!, let me try. VNCoreNLP Tool can create Post taging, but can not detect co-reference resolution :(
Hi @sinhngn,
The code to extract contrapro is up again, here:
https://gitlab.ifi.uzh.ch/mmueller/pronoun-sets
There is not much documentation unfortunately. One file you may be interested in that shows most of the extraction logic is
https://gitlab.ifi.uzh.ch/mmueller/pronoun-sets/-/blob/master/extract.py
Let me know if this is useful and regards Mathias
@sinhngn Does the above answer your question, can this issue be closed?
Yes, thank you very much!
Please, tell me understand contrapro.json, how to generate this file? Because i refer use your tool to evaluation for another language (ex: en-vi) Thank you!