Closed keien closed 10 years ago
Does this happen for any input?
And on which branch?
Happens on Friends describe me as fun, loyal, excellent sense of humour with a hint of sarcasm, full of life and energy, up for great laughs and enjoy most things in life [usually sensibly LOL!].
The ]
breaks something.
Hm, well the problem is that the result from the java library comes like this:
[Text=My CharacterOffsetBegin=0 CharacterOffsetEnd=2 PartOfSpeech=PRP$ Lemma=my] [Text=name CharacterOffsetBegin=3 CharacterOffsetEnd=7 PartOfSpeech=NN Lemma=name] [Text=is CharacterOffsetBegin=8 CharacterOffsetEnd=10 PartOfSpeech=VBZ Lemma=be] [Text=Frank CharacterOffsetBegin=11 CharacterOffsetEnd=16 PartOfSpeech=NNP Lemma=Frank] [Text=[ CharacterOffsetBegin=17 CharacterOffsetEnd=18 PartOfSpeech=NNP Lemma=[] [Text=yes CharacterOffsetBegin=18 CharacterOffsetEnd=21 PartOfSpeech=RB Lemma=yes] [Text=it CharacterOffsetBegin=22 CharacterOffsetEnd=24 PartOfSpeech=PRP Lemma=it] [Text=is CharacterOffsetBegin=25 CharacterOffsetEnd=27 PartOfSpeech=VBZ Lemma=be] [Text=] CharacterOffsetBegin=27 CharacterOffsetEnd=28 PartOfSpeech=CD Lemma=]] [Text=. CharacterOffsetBegin=28 CharacterOffsetEnd=29 PartOfSpeech=. Lemma=.]
So I'm not sure what kind of regular expression can separate all the different [...]
blocks even when they contain a ]
character.
We might have to un-escape the words in the preprocessor.
Yeah I'm fine with that. Revert the parser back, then put a catcher for the brackets in tokenize_from_raw
where the words are read in to translate them back to the right characters.
We can close this now right?
This just happened and I'm not sure why. Did the recent corenlp update break something?