dkpro / dkpro-similarity

Word and text similarity measures
https://dkpro.github.io/dkpro-similarity
Other
53 stars 22 forks source link

Loading Resources from JAR fails in experiments.sts2013baseline.util.StopwordFilter #13

Open nicolaierbs opened 9 years ago

nicolaierbs commented 9 years ago

Original issue 13 created by dkpro on 2013-08-19T14:46:06.000Z:

What steps will reproduce the problem?

  1. Download the 1.0.1 stable version of the sts2013baseline (gpl package)
  2. Run Pipeline -D

Run terminates with Exception:

Caused by: java.io.FileNotFoundException: class path resource [stopwords/stopwords_english_punctuation.txt] cannot be resolved to absolute file path because it does not reside in the file system: jar:file:/home/someuser/.m2/repository/de/tudarmstadt/ukp/similarity/dkpro/de.tudarmstadt.ukp.similarity.dkpro.data-asl/1.0.1/de.tudarmstadt.ukp.similarity.dkpro.data-asl-1.0.1.jar!/stopwords/stopwords_english_punctuation.txt at org.springframework.util.ResourceUtils.getFile(ResourceUtils.java:204) at org.springframework.core.io.AbstractFileResolvingResource.getFile(AbstractFileResolvingResource.java:52) at de.tudarmstadt.ukp.similarity.experiments.sts2013baseline.util.StopwordFilter.initialize(StopwordFilter.java:75) ... 15 more

Suggested Bugfix:

Replace res.getFile(); with res.getInputStream(); in de.tudarmstadt.ukp.similarity.experiments.sts2013baseline.util.StopwordFilter.initialize(StopwordFilter.java:75)

nicolaierbs commented 9 years ago

Comment #1 originally posted by dkpro on 2013-08-19T15:51:01.000Z:

There is actually at least one more bug of the same type...

Figured out it an easier work-around is to copy the data needed (datasets, goldstandards, stopwords) from the jar file into a separate folder and add it manually to the class path...

Cheers Michael

nicolaierbs commented 9 years ago

Comment #2 originally posted by dkpro on 2013-08-19T15:53:55.000Z:

Yes, this will certainly work, too.

Could you point out the other location, where the same error occurs, so that we can fix that, too?

ps: And thanks for the patch.

nicolaierbs commented 9 years ago

Comment #4 originally posted by dkpro on 2013-08-19T16:03:50.000Z:

Next one was caused by sts2013baseline.util.Features2Arff.toArffFile(Features2Arff.java:49) Same res.getFile() problem


Exception in thread "main" java.io.FileNotFoundException: class path resource [goldstandards/semeval-2012/train/STS.gs.ALL.txt] cannot be resolved to absolute file path because it does not reside in the file system: jar:file:/home/mschuhma/.m2/repository/de/tudarmstadt/ukp/similarity/dkpro/de.tudarmstadt.ukp.similarity.dkpro.data-asl/1.0.1/de.tudarmstadt.ukp.similarity.dkpro.data-asl-1.0.1.jar!/goldstandards/semeval-2012/train/STS.gs.ALL.txt at org.springframework.util.ResourceUtils.getFile(ResourceUtils.java:204) at org.springframework.core.io.AbstractFileResolvingResource.getFile(AbstractFileResolvingResource.java:52) at de.tudarmstadt.ukp.similarity.experiments.sts2013baseline.util.Features2Arff.toArffFile(Features2Arff.java:49) at de.tudarmstadt.ukp.similarity.experiments.sts2013baseline.Pipeline.runTest(Pipeline.java:155) at de.tudarmstadt.ukp.similarity.experiments.sts2013baseline.Pipeline.main(Pipeline.java:101)

nicolaierbs commented 9 years ago

Comment #5 originally posted by dkpro on 2013-08-20T15:06:59.000Z:

I fixed the issue at those two locations. Hopefully that fixed it.

avineshpvs commented 9 years ago

A Similar problem surfaced at sts2013baseline.util.Evaluator.java (Evaluator.java:311)

Exception in thread "main" java.io.FileNotFoundException: class path resource [goldstandards/semeval-2012/train/STS.gs.MSRpar.txt] cannot be resolved to absolute file path because it does not reside in the file system: jar:file:/home/.m2/repository/dkpro/similarity/dkpro.similarity.uima.data-asl/2.1.0/dkpro.similarity.uima.data-asl-2.1.0.jar!/goldstandards/semeval-2012/train/STS.gs.MSRpar.txt at org.springframework.util.ResourceUtils.getFile(ResourceUtils.java:205) at org.springframework.core.io.AbstractFileResolvingResource.getFile(AbstractFileResolvingResource.java:52) at dkpro.similarity.experiments.sts2013baseline.util.Evaluator.computePearsonCorrelation(Evaluator.java:311) at dkpro.similarity.experiments.sts2013baseline.util.Evaluator.runEvaluationMetric(Evaluator.java:232) at dkpro.similarity.experiments.sts2013baseline.Pipeline.runTrain(Pipeline.java:122) at dkpro.similarity.experiments.sts2013baseline.Pipeline.main(Pipeline.java:85)

nicolaierbs commented 9 years ago

I am not sure whether your issue has the same cause. Could you please post the line where you load the file "goldstandards/semeval-2012/train/STS.gs.MSRpar.txt"?

avineshpvs commented 9 years ago

I get the error at the 2nd line. Resource res = r.getResource(gsScoresFilePath);
File gsScoresFile = res.getFile();

nicolaierbs commented 9 years ago

Can you just use the file path directly? 'File file = new File("goldstandards/semeval-2012/train/STS.gs.MSRpar.txt")'? Resource is a class which provides functionality to load files from a jar in the classpath. (It should work though...)

nicolaierbs commented 9 years ago

The file "STS.gs.MSRpar.txt" is in the jar dkpro.similarity.uima.data-asl (version 2.1.0). Running from the command line will not work because several other dependencies are required. If you the dependencies to a maven project it should work fine.

avineshpvs commented 9 years ago

Thanks. I had all the dependencies in the maven project. The pipeline was running fine until this point and it was reading couple of other files from the dkpro.similarity.uima.data-asl, just that it was not able to read this particular file.

nicolaierbs commented 9 years ago

Ok, I'm running out of ideas. It would be good if you could debug it. It might be due to another version of the artifact, but I'm not sure about that.

zesch commented 9 years ago

Could you please try to navigate to the jar in the exception file:/home/.m2/repository/dkpro/similarity/dkpro.similarity.uima.data-asl/2.1.0/dkpro.similarity.uima.data-asl-2.1.0.jar and then look whether the specified file is actually in there? /goldstandards/semeval-2012/train/STS.gs.MSRpar.txt

avineshpvs commented 9 years ago

Yes it is there in the location.

Regards Avinesh

On Thu, Oct 22, 2015 at 8:56 AM, Torsten Zesch notifications@github.com wrote:

Could you please try to navigate to the jar in the exception

file:/home/.m2/repository/dkpro/similarity/dkpro.similarity.uima.data-asl/2.1.0/dkpro.similarity.uima.data-asl-2.1.0.jar and then look whether the specified file is actually in there? /goldstandards/semeval-2012/train/STS.gs.MSRpar.txt

— Reply to this email directly or view it on GitHub https://github.com/dkpro/dkpro-similarity/issues/13#issuecomment-150124996 .

zesch commented 9 years ago

If it is there, I don't see why the exception you have reported should be thrown.

reckart commented 9 years ago

Exception in thread "main" java.io.FileNotFoundException: class path resource [goldstandards/semeval-2012/train/STS.gs.MSRpar.txt] cannot be resolved to absolute file path because it does not reside in the file system: jar:file:/home/.m2/repository/dkpro/similarity/dkpro.similarity.uima.data-asl/2.1.0/dkpro.similarity.uima.data-asl-2.1.0.jar!/goldstandards/semeval-2012/train/STS.gs.MSRpar.txt at org.springframework.util.ResourceUtils.getFile(ResourceUtils.java:205)

So this looks like the calling code is expecting the resource to be located in the file system and tries using getFile() to get a File object representing the resource. However, since the resource is in a JAR, it cannot be addressed using a File object. This would work if the user has DKPro Similarity checked out in the local machine (in which case the resource would be in the Maven target folder on disk), but not if the user is using the resource through a Maven dependency.

zesch commented 9 years ago

Right. Thanks Richard. I didn't saw the problem earlier as the experiment is in GPL tmp repository that still awaits its move to the main repository.

So the quick word around is to check out all of dkpro-similarity. But in the long run we should replace the file references with streams or use ResourceUtils.resolveLocation()

avineshpvs commented 9 years ago

Thanks Richard and Torsten.

Regards Avinsh

On Fri, Oct 23, 2015 at 3:59 PM, Torsten Zesch notifications@github.com wrote:

Right. Thanks Richard. I didn't saw the problem earlier as the experiment is in GPL tmp repository that still awaits its move to the main repository.

So the quick word around is to check out all of dkpro-similarity. But in the long run we should replace the file references with streams or use ResourceUtils.resolveLocation()

— Reply to this email directly or view it on GitHub https://github.com/dkpro/dkpro-similarity/issues/13#issuecomment-150580258 .