Closed namednil closed 5 years ago
Possibly, but yes you're right, after yesterday's modification we don't really need the detokenisation. I see in another issue that you already have output. Did you modify the script to not import the detokeniser?
I just installed the moses tokenization package from the source that was referenced in the issue I linked.
Okay. I'll pull and then modify the code to remove the detokeniser and push that.
Hi, can someone elaborate on this? We should be very cautious about using NLTK in our code because it is not on the whitelist.
Hi, yes. I imported an NLTK detokenizer in one of the scripts. However, we're not currently using it or the code that depends on it given that we are carrying the input string. I'll remove the code that uses it to avoid any confusion.
Yes, please do and then close this issue.
Is this what causes #50?
I just wanted to mention that using nltk should be fine as long as we don't use anything that is pretrained or any nltk corpus, because the whitelist only talks about "constraints on which third-party data or pre-trained models can be used in addition to the resources distributed by the task organizers."
Is this what causes #50? No, we're not using the detokenizer any more. We're pulling the input from the mrp files as-is and using the tokenization from the companion data as well and using them both in the alto corpus.
I can't run the postprocessing UCCA script because it cannot find the module
nltk.tokenize.moses
, I'm using the latest nltk version. Does it have to do with this: https://github.com/lyeoni/nlp-tutorial/issues/2?Also, do we really need "detokenization"? We carry around the input string.