ICIJ / datashare

A self-hosted search engine for documents.
https://datashare.icij.org
GNU Affero General Public License v3.0
596 stars 53 forks source link

NER doesn't work on Windows #931

Closed mvanzalu closed 2 years ago

mvanzalu commented 2 years ago

Describe the bug When I finish an Named Entity Recognition, there are no results (We are supposed to have results).

To Reproduce Steps to reproduce the behavior:

  1. Go to 'Analyze your Documents' and launch a NER
  2. When it's done go back to 'Search' to see the results
  3. No results are available

Expected behavior We are supposed to have results in People/Organization/Location filters and in the "Named Entities" tab in the document.

Screenshots If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

Additional context

2022-08-18 12:16:23,093 [CORENLP-0] INFO  CoreNlpNerModels - downloading models for language FRENCH under dist/models/corenlp/4-0-0/fr
2022-08-18 12:18:58,844 [CORENLP-0] INFO  CoreNlpNerModels - models successfully downloaded for language FRENCH
2022-08-18 12:26:17,747 [CORENLP-0] INFO  CoreNlpNerModels - downloading models for language ENGLISH under dist/models/corenlp/4-0-0/en
2022-08-18 12:32:00,759 [CORENLP-0] INFO  CoreNlpNerModels - models successfully downloaded for language ENGLISH
2022-08-18 12:32:00,766 [CORENLP-0] INFO  CoreNlpNerModels - adding /C:/Users/IEUser/AppData/Roaming/Datashare/dist/models%5ccorenlp%5c4-0-0%5cen%5cstanford-corenlp-4.0.0-models-en.jar to system classloader
2022-08-18 12:32:01,046 [CORENLP-0] ERROR CoreNlpNerModels - failed loading NER
java.io.IOException: Unable to open "edu\stanford\nlp\models\ner\english.all.3class.caseless.distsim.crf.ser.gz" as class path, filename or URL
        at edu.stanford.nlp.io.IOUtils.getInputStreamFromURLOrClasspathOrFileSystem(IOUtils.java:481)
        at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifier(AbstractSequenceClassifier.java:1505)
        at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifier(AbstractSequenceClassifier.java:1497)
        at edu.stanford.nlp.ie.crf.CRFClassifier.getClassifier(CRFClassifier.java:2888)
        at org.icij.datashare.text.nlp.corenlp.models.CoreNlpNerModels.loadModelFile(CoreNlpNerModels.java:75)
        at org.icij.datashare.text.nlp.corenlp.models.CoreNlpNerModels.loadModelFile(CoreNlpNerModels.java:39)
        at org.icij.datashare.text.nlp.AbstractModels.load(AbstractModels.java:59)
        at org.icij.datashare.text.nlp.AbstractModels.get(AbstractModels.java:46)
        at org.icij.datashare.text.nlp.corenlp.models.CoreNlpNerModels.loadModelFile(CoreNlpNerModels.java:68)
        at org.icij.datashare.text.nlp.corenlp.models.CoreNlpNerModels.loadModelFile(CoreNlpNerModels.java:39)
        at org.icij.datashare.text.nlp.AbstractModels.load(AbstractModels.java:59)
        at org.icij.datashare.text.nlp.AbstractModels.get(AbstractModels.java:46)
        at org.icij.datashare.text.nlp.corenlp.CorenlpPipeline.initializeNerAnnotator(CorenlpPipeline.java:183)
        at org.icij.datashare.text.nlp.corenlp.CorenlpPipeline.initialize(CorenlpPipeline.java:78)
        at org.icij.datashare.nlp.NlpConsumer.findNamedEntities(NlpConsumer.java:82)
        at org.icij.datashare.nlp.NlpConsumer.call(NlpConsumer.java:53)
        at org.icij.datashare.nlp.NlpConsumer.call(NlpConsumer.java:20)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
2022-08-18 12:32:01,057 [CORENLP-0] INFO  CoreNlpNerModels - adding /C:/Users/IEUser/AppData/Roaming/Datashare/dist/models%5ccorenlp%5c4-0-0%5cfr%5cstanford-corenlp-4.0.0-models-fr.jar to system classloader
2022-08-18 12:32:01,061 [CORENLP-0] ERROR CoreNlpNerModels - failed loading NER
java.io.IOException: Unable to open "edu\stanford\nlp\models\ner\french-wikiner-4class.crf.ser.gz" as class path, filename or URL
        at edu.stanford.nlp.io.IOUtils.getInputStreamFromURLOrClasspathOrFileSystem(IOUtils.java:481)
        at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifier(AbstractSequenceClassifier.java:1505)
        at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifier(AbstractSequenceClassifier.java:1497)
        at edu.stanford.nlp.ie.crf.CRFClassifier.getClassifier(CRFClassifier.java:2888)
        at org.icij.datashare.text.nlp.corenlp.models.CoreNlpNerModels.loadModelFile(CoreNlpNerModels.java:75)
        at org.icij.datashare.text.nlp.corenlp.models.CoreNlpNerModels.loadModelFile(CoreNlpNerModels.java:39)
        at org.icij.datashare.text.nlp.AbstractModels.load(AbstractModels.java:59)
        at org.icij.datashare.text.nlp.AbstractModels.get(AbstractModels.java:46)
        at org.icij.datashare.text.nlp.corenlp.CorenlpPipeline.initializeNerAnnotator(CorenlpPipeline.java:183)
        at org.icij.datashare.text.nlp.corenlp.CorenlpPipeline.initialize(CorenlpPipeline.java:78)
        at org.icij.datashare.nlp.NlpConsumer.findNamedEntities(NlpConsumer.java:82)
        at org.icij.datashare.nlp.NlpConsumer.call(NlpConsumer.java:53)
        at org.icij.datashare.nlp.NlpConsumer.call(NlpConsumer.java:20)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
2022-08-18 12:32:01,089 [CORENLP-0] INFO  CorenlpPipeline - name-finding for FRENCH in document d1155a3d2d29072ccf43b55c7e2a5b11d85820e35b47f5663a63ffd0f10177c4024a8cadf7c51248d82ce85d5ae1910d (offset 0)
2022-08-18 12:32:15,922 [CORENLP-0] INFO  CoreNlpNerModels - adding /C:/Users/IEUser/AppData/Roaming/Datashare/dist/models%5ccorenlp%5c4-0-0%5cen%5cstanford-corenlp-4.0.0-models-en.jar to system classloader
2022-08-18 12:32:15,930 [CORENLP-0] ERROR CoreNlpNerModels - failed loading NER
java.io.IOException: Unable to open "edu\stanford\nlp\models\ner\english.all.3class.caseless.distsim.crf.ser.gz" as class path, filename or URL
        at edu.stanford.nlp.io.IOUtils.getInputStreamFromURLOrClasspathOrFileSystem(IOUtils.java:481)
        at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifier(AbstractSequenceClassifier.java:1505)
        at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifier(AbstractSequenceClassifier.java:1497)
        at edu.stanford.nlp.ie.crf.CRFClassifier.getClassifier(CRFClassifier.java:2888)
        at org.icij.datashare.text.nlp.corenlp.models.CoreNlpNerModels.loadModelFile(CoreNlpNerModels.java:75)
        at org.icij.datashare.text.nlp.corenlp.models.CoreNlpNerModels.loadModelFile(CoreNlpNerModels.java:39)
        at org.icij.datashare.text.nlp.AbstractModels.load(AbstractModels.java:59)
        at org.icij.datashare.text.nlp.AbstractModels.get(AbstractModels.java:46)
        at org.icij.datashare.text.nlp.corenlp.models.CoreNlpNerModels.loadModelFile(CoreNlpNerModels.java:68)
        at org.icij.datashare.text.nlp.corenlp.models.CoreNlpNerModels.loadModelFile(CoreNlpNerModels.java:39)
        at org.icij.datashare.text.nlp.AbstractModels.load(AbstractModels.java:59)
        at org.icij.datashare.text.nlp.AbstractModels.get(AbstractModels.java:46)
        at org.icij.datashare.text.nlp.corenlp.CorenlpPipeline.processNerClassifier(CorenlpPipeline.java:198)
        at org.icij.datashare.text.nlp.corenlp.CorenlpPipeline.process(CorenlpPipeline.java:95)
        at org.icij.datashare.text.nlp.corenlp.CorenlpPipeline.process(CorenlpPipeline.java:88)
        at org.icij.datashare.nlp.NlpConsumer.findNamedEntities(NlpConsumer.java:85)
        at org.icij.datashare.nlp.NlpConsumer.call(NlpConsumer.java:53)
        at org.icij.datashare.nlp.NlpConsumer.call(NlpConsumer.java:20)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
2022-08-18 12:32:15,933 [CORENLP-0] INFO  CoreNlpNerModels - adding /C:/Users/IEUser/AppData/Roaming/Datashare/dist/models%5ccorenlp%5c4-0-0%5cfr%5cstanford-corenlp-4.0.0-models-fr.jar to system classloader
2022-08-18 12:32:15,937 [CORENLP-0] ERROR CoreNlpNerModels - failed loading NER
java.io.IOException: Unable to open "edu\stanford\nlp\models\ner\french-wikiner-4class.crf.ser.gz" as class path, filename or URL
        at edu.stanford.nlp.io.IOUtils.getInputStreamFromURLOrClasspathOrFileSystem(IOUtils.java:481)
        at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifier(AbstractSequenceClassifier.java:1505)
        at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifier(AbstractSequenceClassifier.java:1497)
        at edu.stanford.nlp.ie.crf.CRFClassifier.getClassifier(CRFClassifier.java:2888)
        at org.icij.datashare.text.nlp.corenlp.models.CoreNlpNerModels.loadModelFile(CoreNlpNerModels.java:75)
        at org.icij.datashare.text.nlp.corenlp.models.CoreNlpNerModels.loadModelFile(CoreNlpNerModels.java:39)
        at org.icij.datashare.text.nlp.AbstractModels.load(AbstractModels.java:59)
        at org.icij.datashare.text.nlp.AbstractModels.get(AbstractModels.java:46)
        at org.icij.datashare.text.nlp.corenlp.CorenlpPipeline.processNerClassifier(CorenlpPipeline.java:198)
        at org.icij.datashare.text.nlp.corenlp.CorenlpPipeline.process(CorenlpPipeline.java:95)
        at org.icij.datashare.text.nlp.corenlp.CorenlpPipeline.process(CorenlpPipeline.java:88)
        at org.icij.datashare.nlp.NlpConsumer.findNamedEntities(NlpConsumer.java:85)
        at org.icij.datashare.nlp.NlpConsumer.call(NlpConsumer.java:53)
        at org.icij.datashare.nlp.NlpConsumer.call(NlpConsumer.java:20)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
2022-08-18 12:32:15,947 [CORENLP-0] WARN  NlpConsumer - error in consumer main loop
java.lang.NullPointerException: null
        at org.icij.datashare.text.nlp.corenlp.CorenlpPipeline.processNerClassifier(CorenlpPipeline.java:200)
        at org.icij.datashare.text.nlp.corenlp.CorenlpPipeline.process(CorenlpPipeline.java:95)
        at org.icij.datashare.text.nlp.corenlp.CorenlpPipeline.process(CorenlpPipeline.java:88)
        at org.icij.datashare.nlp.NlpConsumer.findNamedEntities(NlpConsumer.java:85)
        at org.icij.datashare.nlp.NlpConsumer.call(NlpConsumer.java:53)
        at org.icij.datashare.nlp.NlpConsumer.call(NlpConsumer.java:20)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
2022-08-18 12:32:15,950 [CORENLP-0] INFO  NlpConsumer - exiting main loop
2022-08-18 12:32:15,960 [pool-4-thread-1] INFO  NlpApp - exiting run

(Optional) Your contact, availabilities and timezone if a video call with screensharing is needed For any private information, please consider sending an email to datashare@icij.org.

bamthomas commented 2 years ago

somewhat related to https://github.com/ICIJ/datashare/issues/626

either the jar file is not loaded or the path given to the model in the jar file is not correctly given.

Also could be linked to ICIJ/datashare-api@d3c31910ace6d4bb28f3cc21b9cc2435d6a83b99

bamthomas commented 2 years ago

tested with 10.8.4 it should be fixed