dice-group / FOX

Federated Knowledge Extraction Framework
GNU Affero General Public License v3.0
189 stars 51 forks source link

Unhandled exception from a blind Integer.valueOf #2

Closed nimblemachine closed 10 years ago

nimblemachine commented 10 years ago

Exception in thread "Thread-91" java.lang.NumberFormatException: For input string: "FOX105" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Integer.parseInt(Integer.java:580) at java.lang.Integer.valueOf(Integer.java:766) at org.aksw.fox.nerlearner.PostProcessing.instancesToEntities(PostProcessing.java:196) at org.aksw.fox.nerlearner.FoxClassifier.classify(FoxClassifier.java:152) at org.aksw.fox.nertools.FoxNERTools.getEntities(FoxNERTools.java:134) at org.aksw.fox.Fox.run(Fox.java:236) at org.jetlang.core.BatchExecutorImpl.execute(BatchExecutorImpl.java:11) at org.jetlang.core.RunnableExecutorImpl.run(RunnableExecutorImpl.java:39) at org.jetlang.fibers.ThreadFiber.runThread(ThreadFiber.java:51) at org.jetlang.fibers.ThreadFiber.access$000(ThreadFiber.java:10) at org.jetlang.fibers.ThreadFiber$1.run(ThreadFiber.java:27) at java.lang.Thread.run(Thread.java:745)

renespeck commented 10 years ago

Thank you for your report. FOX handles this Exception now.

Do you remember the used input data?

nimblemachine commented 10 years ago

Sorry for the delayed response, and thanks very much for fixing the exception so promptly.

I'm not sure what exactly was causing the problem, in terms of input data. I'm calling FOX from Python through a REST call, with json parameters and JSONLD returned. The text consists of news stories, parsed and sanitized into text with the Python Goose package, then passed to FOX. During the run where I reported that error, I had successfully processed at least 300 articles in this way before seeing this exception.

I want to thank you for this software. I have used most NER packages at one time or another, and I feel like the current state of nlp demands tools that allow a combination of other components, like FOX. My next step is to get deeper into the code, as I need access to temporal tagging and I want to understand how to use other models like the OntoNotes model from Illinois NER.