CogComp / cogcomp-nlp

CogComp's Natural Language Processing Libraries and Demos: Modules include lemmatizer, ner, pos, prep-srl, quantifier, question type, relation-extraction, similarity, temporal normalizer, tokenizer, transliteration, verb-sense, and more.
http://nlp.cogcomp.org/
Other
471 stars 144 forks source link

NER gazetteers - Trouble downloading #741

Closed LucasPages closed 3 years ago

LucasPages commented 3 years ago

I'm encoutering a similar issue as the one described here : https://github.com/CogComp/cogcomp-nlp/issues/714, while trying to train a NER model using the demo data : ./train.sh test/Test/0224.txt test/Test/0228.txt config/ner.properties

The NER gazetteers can't be downloaded and a java.net.SocketTimeoutException is raised. I wasn't able to fix the issue from the other issue I mentioned above.

This is the error the Exception traces I get :

Downloading the folder from datastore . . . 
        GroupId: readonly.org.cogcomp.gazetteers
        ArtifactId: 1.5/gazetteers.zip
augmentedGroupId: readonly.org.cogcomp.gazetteers
versionedFileName: 1.5/gazetteers.zip
zippedFileName: /home/lucas/.cogcomp-datastore-tmp/gazetteers.zip
java.net.SocketTimeoutException: Connect timed out
    at java.base/sun.nio.ch.NioSocketImpl.timedFinishConnect(NioSocketImpl.java:546)
    at java.base/sun.nio.ch.NioSocketImpl.connect(NioSocketImpl.java:597)
    at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:333)
    at java.base/java.net.Socket.connect(Socket.java:648)
    at com.squareup.okhttp.internal.Platform.connectSocket(Platform.java:101)
    at com.squareup.okhttp.internal.io.RealConnection.connectSocket(RealConnection.java:137)
    at com.squareup.okhttp.internal.io.RealConnection.connect(RealConnection.java:108)
    at com.squareup.okhttp.internal.http.StreamAllocation.findConnection(StreamAllocation.java:184)
    at com.squareup.okhttp.internal.http.StreamAllocation.findHealthyConnection(StreamAllocation.java:126)
    at com.squareup.okhttp.internal.http.StreamAllocation.newStream(StreamAllocation.java:95)
    at com.squareup.okhttp.internal.http.HttpEngine.connect(HttpEngine.java:281)
    at com.squareup.okhttp.internal.http.HttpEngine.sendRequest(HttpEngine.java:224)
    at com.squareup.okhttp.Call.getResponse(Call.java:286)
    at com.squareup.okhttp.Call$ApplicationInterceptorChain.proceed(Call.java:243)
    at com.squareup.okhttp.Call.getResponseWithInterceptorChain(Call.java:205)
    at com.squareup.okhttp.Call.execute(Call.java:80)
    at io.minio.MinioClient.execute(MinioClient.java:826)
    at io.minio.MinioClient.executeHead(MinioClient.java:1018)
    at io.minio.MinioClient.statObject(MinioClient.java:1154)
    at io.minio.MinioClient.getObject(MinioClient.java:1343)
    at org.cogcomp.Datastore.getDirectory(Datastore.java:401)
    at edu.illinois.cs.cogcomp.ner.ExpressiveFeatures.TreeGazetteers.init(TreeGazetteers.java:64)
    at edu.illinois.cs.cogcomp.ner.ExpressiveFeatures.TreeGazetteers.<init>(TreeGazetteers.java:50)
    at edu.illinois.cs.cogcomp.ner.ExpressiveFeatures.GazetteersFactory.init(GazetteersFactory.java:54)
    at edu.illinois.cs.cogcomp.ner.LbjTagger.Parameters.readAndLoadConfig(Parameters.java:312)
    at edu.illinois.cs.cogcomp.ner.LbjTagger.Parameters.readConfigAndLoadExternalData(Parameters.java:96)
    at edu.illinois.cs.cogcomp.ner.NerTagger.main(NerTagger.java:36)
java.io.FileNotFoundException: /home/lucas/.cogcomp-datastore-tmp/gazetteers.zip (No such file or directory)
    at java.base/java.io.FileInputStream.open0(Native Method)
    at java.base/java.io.FileInputStream.open(FileInputStream.java:212)
    at java.base/java.io.FileInputStream.<init>(FileInputStream.java:154)
    at java.base/java.io.FileInputStream.<init>(FileInputStream.java:109)
    at org.cogcomp.ZipHelper.unZipIt(ZipHelper.java:71)
    at org.cogcomp.Datastore.getDirectory(Datastore.java:432)
    at edu.illinois.cs.cogcomp.ner.ExpressiveFeatures.TreeGazetteers.init(TreeGazetteers.java:64)
    at edu.illinois.cs.cogcomp.ner.ExpressiveFeatures.TreeGazetteers.<init>(TreeGazetteers.java:50)
    at edu.illinois.cs.cogcomp.ner.ExpressiveFeatures.GazetteersFactory.init(GazetteersFactory.java:54)
    at edu.illinois.cs.cogcomp.ner.LbjTagger.Parameters.readAndLoadConfig(Parameters.java:312)
    at edu.illinois.cs.cogcomp.ner.LbjTagger.Parameters.readConfigAndLoadExternalData(Parameters.java:96)
    at edu.illinois.cs.cogcomp.ner.NerTagger.main(NerTagger.java:36)
zippedFileName: /home/lucas/.cogcomp-datastore-tmp/gazetteers.zip
path: /home/lucas/.cogcomp-datastore/readonly.org.cogcomp.gazetteers/1.5/gazetteers
artifactId: gazetteers
java.io.FileNotFoundException: /home/lucas/.cogcomp-datastore/readonly.org.cogcomp.gazetteers/1.5/gazetteers/gazetteers/gazetteers-list.txt (No such file or directory)
    at java.base/java.io.FileInputStream.open0(Native Method)
    at java.base/java.io.FileInputStream.open(FileInputStream.java:212)
    at java.base/java.io.FileInputStream.<init>(FileInputStream.java:154)
    at java.base/java.io.FileInputStream.<init>(FileInputStream.java:109)
    at edu.illinois.cs.cogcomp.ner.ExpressiveFeatures.TreeGazetteers.init(TreeGazetteers.java:67)
    at edu.illinois.cs.cogcomp.ner.ExpressiveFeatures.TreeGazetteers.<init>(TreeGazetteers.java:50)
    at edu.illinois.cs.cogcomp.ner.ExpressiveFeatures.GazetteersFactory.init(GazetteersFactory.java:54)
    at edu.illinois.cs.cogcomp.ner.LbjTagger.Parameters.readAndLoadConfig(Parameters.java:312)
    at edu.illinois.cs.cogcomp.ner.LbjTagger.Parameters.readConfigAndLoadExternalData(Parameters.java:96)
    at edu.illinois.cs.cogcomp.ner.NerTagger.main(NerTagger.java:36)
13:33:16 ERROR NerTagger:78 - Exception caught: 
java.lang.NullPointerException
    at edu.illinois.cs.cogcomp.ner.ExpressiveFeatures.ExpressiveFeaturesAnnotator.annotate(ExpressiveFeaturesAnnotator.java:73)
    at edu.illinois.cs.cogcomp.ner.LbjTagger.LearningCurveMultiDataset.getLearningCurve(LearningCurveMultiDataset.java:72)
    at edu.illinois.cs.cogcomp.ner.NerTagger.main(NerTagger.java:73)
13:33:16 ERROR NerTagger:80 - 

Is there something I'm missing somewhere ? Is it still a server issue, or a problem in the code ? Thank you for any help.

LucasPages commented 3 years ago

I just fixed the issue. For some reason the ResourceConfigurator class in my local repository didn't match the one on the GitHub and still used the former server address. Switching it to the new one fixed it