CogComp / cogcomp-nlp

CogComp's Natural Language Processing Libraries and Demos: Modules include lemmatizer, ner, pos, prep-srl, quantifier, question type, relation-extraction, similarity, temporal normalizer, tokenizer, transliteration, verb-sense, and more.
http://nlp.cogcomp.org/
Other
471 stars 144 forks source link

NER Gazetteer not downloading #714

Open himanshumangla opened 5 years ago

himanshumangla commented 5 years ago

When trying to run the training with demo data for the first time as:

java -Xmx8g -cp target/classes:target/dependency/* edu.illinois.cs.cogcomp.ner.NerTagger -train test/Test/0224.txt test/Test/0228.txt config/ner.properties

I get the following error: log4j:WARN No appenders could be found for logger (edu.illinois.cs.cogcomp.ner.LbjTagger.Parameters). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. Downloading the folder from datastore . . . GroupId: readonly.org.cogcomp.gazetteers ArtifactId: 1.6/gazetteers.zip augmentedGroupId: readonly.org.cogcomp.gazetteers versionedFileName: 1.6/gazetteers.zip zippedFileName: /home/himanshu/.cogcomp-datastore/tmp/1.6/gazetteers.zip java.net.SocketTimeoutException: connect timed out at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at com.squareup.okhttp.internal.Platform.connectSocket(Platform.java:101) at com.squareup.okhttp.internal.io.RealConnection.connectSocket(RealConnection.java:137) at com.squareup.okhttp.internal.io.RealConnection.connect(RealConnection.java:108) at com.squareup.okhttp.internal.http.StreamAllocation.findConnection(StreamAllocation.java:184) at com.squareup.okhttp.internal.http.StreamAllocation.findHealthyConnection(StreamAllocation.java:126) at com.squareup.okhttp.internal.http.StreamAllocation.newStream(StreamAllocation.java:95) at com.squareup.okhttp.internal.http.HttpEngine.connect(HttpEngine.java:281) at com.squareup.okhttp.internal.http.HttpEngine.sendRequest(HttpEngine.java:224) at com.squareup.okhttp.Call.getResponse(Call.java:286) at com.squareup.okhttp.Call$ApplicationInterceptorChain.proceed(Call.java:243) at com.squareup.okhttp.Call.getResponseWithInterceptorChain(Call.java:205) at com.squareup.okhttp.Call.execute(Call.java:80) at io.minio.MinioClient.execute(MinioClient.java:826) at io.minio.MinioClient.executeHead(MinioClient.java:1018) at io.minio.MinioClient.statObject(MinioClient.java:1154) at io.minio.MinioClient.getObject(MinioClient.java:1343) at org.cogcomp.Datastore.getDirectory(Datastore.java:556) at edu.illinois.cs.cogcomp.ner.ExpressiveFeatures.TreeGazetteers.init(TreeGazetteers.java:71) at edu.illinois.cs.cogcomp.ner.ExpressiveFeatures.TreeGazetteers.(TreeGazetteers.java:50) at edu.illinois.cs.cogcomp.ner.ExpressiveFeatures.GazetteersFactory.get(GazetteersFactory.java:50) at edu.illinois.cs.cogcomp.ner.LbjTagger.Parameters.readAndLoadConfig(Parameters.java:265) at edu.illinois.cs.cogcomp.ner.LbjTagger.Parameters.readConfigAndLoadExternalData(Parameters.java:91) at edu.illinois.cs.cogcomp.ner.NerTagger.main(NerTagger.java:38) java.io.FileNotFoundException: /home/himanshu/.cogcomp-datastore/tmp/1.6/gazetteers.zip (No such file or directory) at java.io.FileInputStream.open0(Native Method) at java.io.FileInputStream.open(FileInputStream.java:195) at java.io.FileInputStream.(FileInputStream.java:138) at java.io.FileInputStream.(FileInputStream.java:93) at org.cogcomp.ZipHelper.unZipIt(ZipHelper.java:71) at org.cogcomp.Datastore.getDirectory(Datastore.java:585) at edu.illinois.cs.cogcomp.ner.ExpressiveFeatures.TreeGazetteers.init(TreeGazetteers.java:71) at edu.illinois.cs.cogcomp.ner.ExpressiveFeatures.TreeGazetteers.(TreeGazetteers.java:50) at edu.illinois.cs.cogcomp.ner.ExpressiveFeatures.GazetteersFactory.get(GazetteersFactory.java:50) at edu.illinois.cs.cogcomp.ner.LbjTagger.Parameters.readAndLoadConfig(Parameters.java:265) at edu.illinois.cs.cogcomp.ner.LbjTagger.Parameters.readConfigAndLoadExternalData(Parameters.java:91) at edu.illinois.cs.cogcomp.ner.NerTagger.main(NerTagger.java:38) zippedFileName: /home/himanshu/.cogcomp-datastore/tmp/1.6/gazetteers.zip path: /home/himanshu/.cogcomp-datastore/readonly.org.cogcomp.gazetteers/1.6/gazetteers artifactId: gazetteers java.io.FileNotFoundException: /home/himanshu/.cogcomp-datastore/readonly.org.cogcomp.gazetteers/1.6/gazetteers/gazetteers/gazetteers-list.txt (No such file or directory) at java.io.FileInputStream.open0(Native Method) at java.io.FileInputStream.open(FileInputStream.java:195) at java.io.FileInputStream.(FileInputStream.java:138) at java.io.FileInputStream.(FileInputStream.java:93) at edu.illinois.cs.cogcomp.ner.ExpressiveFeatures.TreeGazetteers.init(TreeGazetteers.java:72) at edu.illinois.cs.cogcomp.ner.ExpressiveFeatures.TreeGazetteers.(TreeGazetteers.java:50) at edu.illinois.cs.cogcomp.ner.ExpressiveFeatures.GazetteersFactory.get(GazetteersFactory.java:50) at edu.illinois.cs.cogcomp.ner.LbjTagger.Parameters.readAndLoadConfig(Parameters.java:265) at edu.illinois.cs.cogcomp.ner.LbjTagger.Parameters.readConfigAndLoadExternalData(Parameters.java:91) at edu.illinois.cs.cogcomp.ner.NerTagger.main(NerTagger.java:38) java.lang.NullPointerException at edu.illinois.cs.cogcomp.ner.ExpressiveFeatures.ExpressiveFeaturesAnnotator.annotate(ExpressiveFeaturesAnnotator.java:40) at edu.illinois.cs.cogcomp.ner.LbjTagger.LearningCurveMultiDataset.getLearningCurve(LearningCurveMultiDataset.java:102) at edu.illinois.cs.cogcomp.ner.NerTagger.main(NerTagger.java:47)

I have tried deleting the folder followed by several retries at executing the same, but every time the same error pops up. Has the Gazetteer-1.6.zip file been moved elsewhere? Is there an updated URL to the same?

Rhovan-cloud commented 5 years ago

I am currently encountering a similar problem: The gazetteers cannot be downloaded because a connection to the server cannot be established (Failed to connect to smaug.cs.illinois.edu/192.17.58.151:8080). This problem persists at least for a week, is there any other way I can get the gazetteers?

danyaljj commented 5 years ago

Sorry about these issues.

We were having issues

Since we transitioned to UPenn, some of our infrastructure machines have also moved here. As such, the server (smaug.cs.illinois.edu) does not exist anymore and instead, we have (http://macniece.seas.upenn.edu:4008). This issue should be fixed in the main repository if you close the source code here.

FYI @HeglerTissot

Rhovan-cloud commented 5 years ago

Thank you, the download from the UPenn server was possible. For other users that encounter this problem while using the maven plugin from http://cogcomp.org/m2repo/: You can simply change the ResourceConfigurator endpoint to the UPenn server prior to calling the annotator: ResourceConfigurator.ENDPOINT.value = "http://macniece.seas.upenn.edu:4008";

LucasPages commented 3 years ago

I'm getting a similar problem with the new address :

java.net.ConnectException: Failed to connect to macniece.seas.upenn.edu/158.130.57.77:4008 at com.squareup.okhttp.internal.io.RealConnection.connectSocket(RealConnection.java:139) at com.squareup.okhttp.internal.io.RealConnection.connect(RealConnection.java:108) at com.squareup.okhttp.internal.http.StreamAllocation.findConnection(StreamAllocation.java:184) at com.squareup.okhttp.internal.http.StreamAllocation.findHealthyConnection(StreamAllocation.java:126) at com.squareup.okhttp.internal.http.StreamAllocation.newStream(StreamAllocation.java:95) at com.squareup.okhttp.internal.http.HttpEngine.connect(HttpEngine.java:281) at com.squareup.okhttp.internal.http.HttpEngine.sendRequest(HttpEngine.java:224) at com.squareup.okhttp.Call.getResponse(Call.java:286) at com.squareup.okhttp.Call$ApplicationInterceptorChain.proceed(Call.java:243) at com.squareup.okhttp.Call.getResponseWithInterceptorChain(Call.java:205) at com.squareup.okhttp.Call.execute(Call.java:80) at io.minio.MinioClient.execute(MinioClient.java:826) at io.minio.MinioClient.executeHead(MinioClient.java:1018) at io.minio.MinioClient.statObject(MinioClient.java:1154) at io.minio.MinioClient.getObject(MinioClient.java:1343) at org.cogcomp.Datastore.getDirectory(Datastore.java:556) at edu.illinois.cs.cogcomp.ner.ExpressiveFeatures.TreeGazetteers.init(TreeGazetteers.java:71) at edu.illinois.cs.cogcomp.ner.ExpressiveFeatures.TreeGazetteers.<init>(TreeGazetteers.java:50) at edu.illinois.cs.cogcomp.ner.ExpressiveFeatures.GazetteersFactory.get(GazetteersFactory.java:50) at edu.illinois.cs.cogcomp.ner.LbjTagger.Parameters.readAndLoadConfig(Parameters.java:265) at edu.illinois.cs.cogcomp.ner.LbjTagger.Parameters.readConfigAndLoadExternalData(Parameters.java:91) at edu.illinois.cs.cogcomp.ner.NerTagger.main(NerTagger.java:38)

Is there something wrong with the server ? Or my build ?

HeglerTissot commented 3 years ago

I don't think we currently have any service running on macniece port 4008.

What should it be?

On Wed, Dec 2, 2020 at 5:34 PM LucasPages notifications@github.com wrote:

I'm getting a similar problem with the new address :

java.net.ConnectException: Failed to connect to macniece.seas.upenn.edu/158.130.57.77:4008 at com.squareup.okhttp.internal.io.RealConnection.connectSocket(RealConnection.java:139) at com.squareup.okhttp.internal.io.RealConnection.connect(RealConnection.java:108) at com.squareup.okhttp.internal.http.StreamAllocation.findConnection(StreamAllocation.java:184) at com.squareup.okhttp.internal.http.StreamAllocation.findHealthyConnection(StreamAllocation.java:126) at com.squareup.okhttp.internal.http.StreamAllocation.newStream(StreamAllocation.java:95) at com.squareup.okhttp.internal.http.HttpEngine.connect(HttpEngine.java:281) at com.squareup.okhttp.internal.http.HttpEngine.sendRequest(HttpEngine.java:224) at com.squareup.okhttp.Call.getResponse(Call.java:286) at com.squareup.okhttp.Call$ApplicationInterceptorChain.proceed(Call.java:243) at com.squareup.okhttp.Call.getResponseWithInterceptorChain(Call.java:205) at com.squareup.okhttp.Call.execute(Call.java:80) at io.minio.MinioClient.execute(MinioClient.java:826) at io.minio.MinioClient.executeHead(MinioClient.java:1018) at io.minio.MinioClient.statObject(MinioClient.java:1154) at io.minio.MinioClient.getObject(MinioClient.java:1343) at org.cogcomp.Datastore.getDirectory(Datastore.java:556) at edu.illinois.cs.cogcomp.ner.ExpressiveFeatures.TreeGazetteers.init(TreeGazetteers.java:71) at edu.illinois.cs.cogcomp.ner.ExpressiveFeatures.TreeGazetteers.(TreeGazetteers.java:50) at edu.illinois.cs.cogcomp.ner.ExpressiveFeatures.GazetteersFactory.get(GazetteersFactory.java:50) at edu.illinois.cs.cogcomp.ner.LbjTagger.Parameters.readAndLoadConfig(Parameters.java:265) at edu.illinois.cs.cogcomp.ner.LbjTagger.Parameters.readConfigAndLoadExternalData(Parameters.java:91) at edu.illinois.cs.cogcomp.ner.NerTagger.main(NerTagger.java:38)

Is there something wrong with the server ? Or my build ?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/CogComp/cogcomp-nlp/issues/714#issuecomment-737536860, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGJ63QG2TBN37NQOZ33U4CDSS26HZANCNFSM4GRYS5KA .

chrisoutwright commented 3 years ago

What should the endpoint be? Or how can I find these:

readonly.org.cogcomp.mention
1.0\ACE_HEAD_TYPE.zip
\1.0\ACE_HEAD_TYPE

And

ner-model-enron-conll-all-data.zip

108598057 commented 3 years ago

@danyaljj I have the same problem. (http://macniece.seas.upenn.edu:4008) is unusable.

chouisgiser commented 2 years ago

@108598057 did you resolve the problem. I also encountered this problem