Closed alexanderkoller closed 4 years ago
I think I just forgot to remove it.
Ok, I'm taking it out.
@mariomgmn This still seems to happen in get_companion_tokenization.py - is is necessary there?
Oh yes, you're right. No, it isn't necessary given that we handled unicode reading issue. I'll comment it out.
No, it is necessary for token matching.
Could you explain in what way?
without the normalisation, we get this error at decomposition time:
de.saar.coli.amrtagging.MRInstance$UnalignedNode: 201111 with label Non-Terminal at de.saar.coli.amrtagging.MRInstance.checkEverythingAligned(MRInstance.java:84) at de.saar.coli.amrtagging.formalisms.ucca.tools.CreateCorpusParallel.lambda$main$4(CreateCorpusParallel.java:203) at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184) at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) at java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:291) at java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731) at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)
I'm not entirely sure how to interpret this. Could it be a problem with the token spans and the tokens themselves matching up?
... is still performed as part of UCCA#refine. Why?