Closed michaelcapizzi closed 8 years ago
Note: The maximum sentence length (according to scala processors
) was 77.
@michaelcapizzi, do you happen to know which line py-processors
barfs on?
@myedibleenso The line below failed to annotate.
After I got hold of the GI Bill and we bought a house in Rockville , Maryland , at the fantastic rate of 4 1/4 percent interest , which I thought was very , very high in those days .
Hmm, seems to work under Python 3. Are you using 2.x?
I'll look into it. py-processors
was developed using 3.x, so there may be some encoding/decoding issues to sort out.
Certain lines of the COCA dataset fail to annotate (
proc.annotate(text)
) with this error:Here is one such line (note: each line is a full transcript of a radio episode):
The error occurs whether this is fed in as one line or multiple lines.
When run directly in
scala
usingprocessors
(p.mkDocument(text)
), it does succeed in processing the document but if aprocessor
has not yet been loaded, it takes a long time to make the document. (if aprocessor
has already been loaded, it's very fast, as expected)