Open MariamNakhle opened 9 months ago
Hey! Sorry for the delay. It is normal to have some coreference errors (due to difference in tokenization from Stanza/coref model for example). How big is your file (how many lines does it have)? It seems there were only two coreference errors, so if your file is big I wouldn't worry
Hi, the file was just a test with less than 20 lines, but anyways thanks for answering!
I am having some problems when running the script. I created my environment using the muda_env.yml file. When I test it on a small test document, I run into a "coref error". It seems to me that it is related to multi-token tokenisation (for example the Spanish word "al" is tokenized as "a" and "el"). See below for an example.
I'd be grateful for your thoughts on this!
This is the command I used:
PYTHONPATH=/home/getalp/nakhlem/MuDA python muda/main.py --src my_data/text.en --tgt my_data/text.es --docids my_data/text.docids --dump-tags my_data/test_enes_muda-env-yaml.tags --tgt-lang "es"
And this is the full message:
Originally posted by @MariamNakhle in https://github.com/CoderPat/MuDA/issues/16#issuecomment-1885178159