Closed danyaljj closed 7 years ago
Ya, I’ll fix this first thing tomorrow AM.
On May 4, 2017, at 5:16 PM, Daniel Khashabi notifications@github.com wrote:
Tom could you look at this tokenization issue?
String text = "You see always, oh we're going to do this, we're going to--. "; TextAnnotation basicTextAnnotation = null; try { basicTextAnnotation = processor.createBasicTextAnnotation("test", "test", text); } catch (AnnotatorException e) { e.printStackTrace(); fail(e.getMessage()); } }
output:
java.lang.StringIndexOutOfBoundsException: String index out of range: 0
at java.lang.String.charAt(String.java:658) at edu.illinois.cs.cogcomp.nlp.tokenizer.TokenizerStateMachine$State.isAbbr(TokenizerStateMachine.java:676) at edu.illinois.cs.cogcomp.nlp.tokenizer.TokenizerStateMachine$5.process(TokenizerStateMachine.java:314) at edu.illinois.cs.cogcomp.nlp.tokenizer.TokenizerStateMachine.parseText(TokenizerStateMachine.java:610) at edu.illinois.cs.cogcomp.nlp.tokenizer.StatefulTokenizer.tokenizeTextSpan(StatefulTokenizer.java:79) at edu.illinois.cs.cogcomp.nlp.utility.TokenizerTextAnnotationBuilder.createTextAnnotation(TokenizerTextAnnotationBuilder.java:83) at edu.illinois.cs.cogcomp.annotation.BasicAnnotatorService.createBasicTextAnnotation(BasicAnnotatorService.java:165) at edu.illinois.cs.cogcomp.pipeline.main.CachingPipelineTest.weirdSentences(CachingPipelineTest.java:218) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.internal.runners.TestMethod.invoke(TestMethod.java:59) at org.junit.internal.runners.MethodRoadie.runTestMethod(MethodRoadie.java:98) at org.junit.internal.runners.MethodRoadie$2.run(MethodRoadie.java:79) at org.junit.internal.runners.MethodRoadie.runBeforesThenTestThenAfters(MethodRoadie.java:87) at org.junit.internal.runners.MethodRoadie.runTest(MethodRoadie.java:77) at org.junit.internal.runners.MethodRoadie.run(MethodRoadie.java:42) at org.junit.internal.runners.JUnit4ClassRunner.invokeTestMethod(JUnit4ClassRunner.java:88) at org.junit.internal.runners.JUnit4ClassRunner.runMethods(JUnit4ClassRunner.java:51) at org.junit.internal.runners.JUnit4ClassRunner$1.run(JUnit4ClassRunner.java:44) at org.junit.internal.runners.ClassRoadie.runUnprotected(ClassRoadie.java:27) at org.junit.internal.runners.ClassRoadie.runProtected(ClassRoadie.java:37) at org.junit.internal.runners.JUnit4ClassRunner.run(JUnit4ClassRunner.java:42) at org.junit.runner.JUnitCore.run(JUnitCore.java:130) at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) at com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:51) at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:237) at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147)
FYI @mssammon https://github.com/mssammon — You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/CogComp/cogcomp-nlp/issues/452, or mute the thread https://github.com/notifications/unsubscribe-auth/ACdHS2aQM3pzavo3CXM_i2RYU2Ge96-jks5r2k49gaJpZM4NRSRg.
@mssammon @danyaljj I have fixed this. However, I have ongoing development in my fork wrt the OntoNotes 5.0 parser. What do we do in situations like this? This is a very minor fix, should we be creating branches for these one offs? In this case, can I wait till we are ready to merge my fork?
@cowchipkid If the ontonotes parser is going to take longer than a few hours to complete, please create a separate branch and PR for just the tokenizer fix.
Tom could you look at this tokenization issue?
output:
FYI @mssammon