coli-saar / am-parser

Modular implementation of an AM dependency parser in AllenNLP.
Apache License 2.0
30 stars 10 forks source link

UCCA normalization #67

Closed alexanderkoller closed 4 years ago

alexanderkoller commented 4 years ago

... is still performed as part of UCCA#refine. Why?

namednil commented 4 years ago

I think I just forgot to remove it.

alexanderkoller commented 4 years ago

Ok, I'm taking it out.

alexanderkoller commented 4 years ago

@mariomgmn This still seems to happen in get_companion_tokenization.py - is is necessary there?

mariomgmn commented 4 years ago

Oh yes, you're right. No, it isn't necessary given that we handled unicode reading issue. I'll comment it out.

mariomgmn commented 4 years ago

No, it is necessary for token matching.

alexanderkoller commented 4 years ago

Could you explain in what way?

mariomgmn commented 4 years ago

without the normalisation, we get this error at decomposition time:

de.saar.coli.amrtagging.MRInstance$UnalignedNode: 201111 with label Non-Terminal at de.saar.coli.amrtagging.MRInstance.checkEverythingAligned(MRInstance.java:84) at de.saar.coli.amrtagging.formalisms.ucca.tools.CreateCorpusParallel.lambda$main$4(CreateCorpusParallel.java:203) at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184) at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) at java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:291) at java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731) at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)

I'm not entirely sure how to interpret this. Could it be a problem with the token spans and the tokens themselves matching up?