Closed rudinger closed 7 years ago
@rudinger, similar to #20, I think this may be caused from using different factbank version than the one we were using. We had a lot of grievances when aligning spaCy to external tokenization, so it makes sense that something like this may occur if we have different versions.
Is it possible for you to attach the sentences which fail with this error?
'SJMN91-06338157.tml'|||11|||'"He was told . . . the handwriting was on the wall," the source said Monday.'
It seems that this error occurs when using a slightly different version of FactBank, which replaces some Uu
labels with NA
, which seems to be semantically identical.
Reverting to Uu
labels solves the problem.
Fails with this exception:
I think what is happening is that, in the previous step (of the
align
method),cur_tok
is'.'
andcur_word
is'. . .'
so a couple things are going wrong: (1)cur_tok + str(toks[toks_ind + 1])
returns'..'
instead of'. .'
, so it is not recognized as a substring of'. . .'
when it should be (because the space is missing). (2) It seems like this is a case wheretoks[toks_ind : toks_ind + 2].merge()
should be the selected action, but in fact it should be something liketoks[toks_ind : toks_ind + 3].merge()
, because the word actually corresponds to three separate tokens.This is the temporary/hacky solution I put in the method: