Closed danlou closed 5 years ago
Hi,
Thank you for your interest in our work ! You're right, there is a bug when trying to convert a corpus without "id=" tags on words. I will try to fix it this afternoon, by generating an id on documents, sentences and sense annoted words, during the conversion process :)
I'll keep you inform as soon as it's ready !
@danlou The bug is now fixed ! I added a "target_X" id to every sense annotated word during the conversion process (don't manage documents and sentences in the end, unless there is a real need).
Please "git pull", "./java/compile.sh", and tell me if everything works for you !
Thanks for solving this issue so fast! It worked.
I've now found a couple of escaping errors in the xml for some characters (e.g. &, <, >), but managed to fix those manually in couple of seconds (with find/replace all).
Hi,
I'm interested in using your scripts to convert MASC to the format used in Raganato's framework, but it seems there some issue to be resolved.
I'm running the command:
sh UFSAC/scripts/convert_to_raganato.sh --input masc.xml --output masc_converted.xml
This generates two files, as expected:
But the key file is empty, and it doesn't look like the data file contains any key references.
Do you think this can be solved?
Your work in converting all this corpora into the same format, and all mapped to WN3.0, is a much appreciated effort btw!
Thanks, Daniel