Open rraub opened 8 years ago
I put the code snippet from the rraub-tagger-error-catching
branch and it seems to work in the constructor where the tagger is initialized; however, it seems to count every usage of the tagger as enough of a reason to give an error message, despite there seemingly being no actual error.
commit: https://github.com/gios-asu/search-api/commit/388a0756c02832f55f7a8d896a676731f5ea62b8
$tagger->getErrors()
gets the stderr
output for java -mx300m -cp "/usr/local/bin/stanford-ner-2015-04-20/stanford-ner.jar:" edu.stanford.nlp.ie.crf.CRFClassifier -loadClassifier /usr/local/bin/stanford-ner-2015-04-20/classifiers/english.all.3class.distsim.crf.ser.gz -textFile /tmp/phpnlptagtk2StK -encoding utf8
, the command that the PHP wrapper we're using invokes.
phpnlptagtk2StK
is a temp file that the plaintext to be tagged is stored.
If I force this command to only output stderr
in the terminal, I get this:
batman@epicac2:~/workspace/search-api$ java -mx300m -cp "/usr/local/bin/stanford-ner-2015-04-20/stanford-ner.jar:" edu.stanford.nlp.ie.crf.CRFClassifier -loadClassifier /usr/local/bin/stanford-ner-2015-04-20/classifiers/english.all.3class.distsim.crf.ser.gz -textFile /tmp/phpnlptagtk2StK -encoding utf8 2>&1 /dev/null
CRFClassifier invoked on Tue Mar 15 15:42:02 MST 2016 with arguments:
-loadClassifier /usr/local/bin/stanford-ner-2015-04-20/classifiers/english.all.3class.distsim.crf.ser.gz -textFile /tmp/phpnlptagtk2StK -encoding utf8 /dev/null
loadClassifier=/usr/local/bin/stanford-ner-2015-04-20/classifiers/english.all.3class.distsim.crf.ser.gz
encoding=utf8
textFile=/tmp/phpnlptagtk2StK
=/dev/null
Loading classifier from /usr/local/bin/stanford-ner-2015-04-20/classifiers/english.all.3class.distsim.crf.ser.gz ... done [2.9 sec].
term/O term/O term/O
CRFClassifier tagged 3 words in 1 documents at 43.48 words per second.
So my hunch seems to be correct. stderr
is being used to output more than just errors. :P
Here's a potential workaround if you want to detect an error in the subprocess using $tagger->getError()
.
Here's a command in the same form that the PHP wrapper (the PHP Stanford NLP lib we're using) builds that causes an error:
batman@epicac2:~/workspace/search-api$ java -mx300m -cp "/usr/local/bin/stanford-ner-2015-04-20/stanford-ner.jar:" edu.stanford.nlp.ie.crf.CRFClassifier -loadClassifier /usr/local/bin/stanford-ner-2015-04-20/classifiers/english.all.3class.distsim.crf.ser.gz -textFile garbage-in-garbage-out -encoding utf8 2>&1 /dev/null
CRFClassifier invoked on Tue Mar 15 15:51:03 MST 2016 with arguments:
-loadClassifier /usr/local/bin/stanford-ner-2015-04-20/classifiers/english.all.3class.distsim.crf.ser.gz -textFile garbage-in-garbage-out -encoding utf8 /dev/null
loadClassifier=/usr/local/bin/stanford-ner-2015-04-20/classifiers/english.all.3class.distsim.crf.ser.gz
encoding=utf8
textFile=garbage-in-garbage-out
=/dev/null
Loading classifier from /usr/local/bin/stanford-ner-2015-04-20/classifiers/english.all.3class.distsim.crf.ser.gz ... done [3.0 sec].
Exception in thread "main" edu.stanford.nlp.io.RuntimeIOException: java.io.FileNotFoundException: garbage-in-garbage-out (No such file or directory)
at edu.stanford.nlp.io.IOUtils.inputStreamFromFile(IOUtils.java:509)
at edu.stanford.nlp.io.IOUtils.readerFromFile(IOUtils.java:550)
at edu.stanford.nlp.objectbank.ReaderIteratorFactory$ReaderIterator.setNextObject(ReaderIteratorFactory.java:189)
at edu.stanford.nlp.objectbank.ReaderIteratorFactory$ReaderIterator.<init>(ReaderIteratorFactory.java:161)
at edu.stanford.nlp.objectbank.ResettableReaderIteratorFactory.iterator(ResettableReaderIteratorFactory.java:98)
at edu.stanford.nlp.objectbank.ObjectBank$OBIterator.<init>(ObjectBank.java:414)
at edu.stanford.nlp.objectbank.ObjectBank.iterator(ObjectBank.java:253)
at edu.stanford.nlp.sequences.ObjectBankWrapper.iterator(ObjectBankWrapper.java:52)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.classifyAndWriteAnswers(AbstractSequenceClassifier.java:1160)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.classifyAndWriteAnswers(AbstractSequenceClassifier.java:1111)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.classifyAndWriteAnswers(AbstractSequenceClassifier.java:1071)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.classifyAndWriteAnswers(AbstractSequenceClassifier.java:1052)
at edu.stanford.nlp.ie.crf.CRFClassifier.main(CRFClassifier.java:3056)
Caused by: java.io.FileNotFoundException: garbage-in-garbage-out (No such file or directory)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.<init>(FileInputStream.java:138)
at edu.stanford.nlp.io.IOUtils.inputStreamFromFile(IOUtils.java:503)
... 12 more
Sorry for the wall of text. Above error is unlikely for us. But this could happen:
batman@epicac2:~/workspace/search-api$ java -mx300m -cp "/usr/local/bin/stanford-ner-2015-04-20/stanford-ner.jar:" edu.stanford.nlp.ie.crf.CRFClassifier -loadClassifier /usr/local/bin/stanford-ner-2015-04-20/classifiers/this-classifier-does-not-exist -textFile /tmp/zphpnlptagtk2StK -encoding utf8 2>&1 /dev/null
CRFClassifier invoked on Tue Mar 15 15:51:50 MST 2016 with arguments:
-loadClassifier /usr/local/bin/stanford-ner-2015-04-20/classifiers/this-classifier-does-not-exist -textFile /tmp/zphpnlptagtk2StK -encoding utf8 /dev/null
loadClassifier=/usr/local/bin/stanford-ner-2015-04-20/classifiers/this-classifier-does-not-exist
encoding=utf8
textFile=/tmp/zphpnlptagtk2StK
=/dev/null
Loading classifier from /usr/local/bin/stanford-ner-2015-04-20/classifiers/this-classifier-does-not-exist ... Error deserializing /usr/local/bin/stanford-ner-2015-04-20/classifiers/this-classifier-does-not-exist
Exception in thread "main" java.lang.RuntimeException: java.io.FileNotFoundException: /usr/local/bin/stanford-ner-2015-04-20/classifiers/this-classifier-does-not-exist (No such file or directory)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifierNoExceptions(AbstractSequenceClassifier.java:1572)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifierNoExceptions(AbstractSequenceClassifier.java:1523)
at edu.stanford.nlp.ie.crf.CRFClassifier.main(CRFClassifier.java:2987)
Caused by: java.io.FileNotFoundException: /usr/local/bin/stanford-ner-2015-04-20/classifiers/this-classifier-does-not-exist (No such file or directory)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.<init>(FileInputStream.java:138)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifier(AbstractSequenceClassifier.java:1556)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifierNoExceptions(AbstractSequenceClassifier.java:1569)
... 2 more
The error above could be triggered by misconfiguration of our app. For example: config.conf
points to a bad directory for our NER jar files/classifiers.
Anyways, I was thinking we just look for the presence of the string "Exception" in $tagger->getError()
. This should be sufficient to detect an unrecoverable error. You can go further and extract the rest of the text so we get something like:
Exception in thread "main" java.lang.RuntimeException: java.io.FileNotFoundException: /usr/local/bin/stanford-ner-2015-04-20/classifiers/this-classifier-does-not-exist (No such file or directory)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifierNoExceptions(AbstractSequenceClassifier.java:1572)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifierNoExceptions(AbstractSequenceClassifier.java:1523)
at edu.stanford.nlp.ie.crf.CRFClassifier.main(CRFClassifier.java:2987)
Caused by: java.io.FileNotFoundException: /usr/local/bin/stanford-ner-2015-04-20/classifiers/this-classifier-does-not-exist (No such file or directory)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.<init>(FileInputStream.java:138)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifier(AbstractSequenceClassifier.java:1556)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifierNoExceptions(AbstractSequenceClassifier.java:1569)
... 2 more
It would be pretty simple to do.
(Edge case warning! Would not work in Chinese)
Edit: Pinging @iajohns1
On travis the NER tagger is erroring out, but because we are not looking at the errors via:
check out the
rraub-ner-tagger-error-catching
branch for explanation