Closed danyaljj closed 7 years ago
for all inputs, or just for some? Is this a change in behavior for data you tried before, but successfully the previous time?
It happens for many inputs. And I think it started 2-3 weeks ago.
I'll dig deeper into the details of the issue and give more details
Updated the error msg.
It went away after I started using tokenized text as input to curator.
You mean, whitespace tokenized? which curator server are you using? -- I set up a second curator instance on a different host/port that accepts tokenized text; are you referring to that?
-- this new curator went online just two or three days ago though
I think the old curator also supports it? I'm using the old one. I turned this on and the issue went away.
This worries me. This flag is meant to be used if you send curator whitespace-tokenized text, to force it not to re-tokenize already tokenized text. Since char offsets can't be easily preserved, this was not used much (as I recall, anyway). Technically we could use this with the StringTransformation object to track/restore original token char offsets. But I don't see why you would have problems with this flag set to 'false', unless the tokenizer is disabled. If you call the austen/9011 curator (the modified one), you will get an error if you don't call it with a TextAnnotation with token and sentence views, b/c I had to disable the local tokenization. But trollope/9010 should work just fine -- nothing should be changed there.
I keep getting this exception when trying to copy a view from a TextAnnotation created by curator to a TextAnnotation created by pipeline.