codeaudit / dkpro-core-asl

Automatically exported from code.google.com/p/dkpro-core-asl
0 stars 0 forks source link

ResourceCollectionReaderBase chokes on absolute paths in Windows #66

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
ResourceCollectionReaderBase does not like absolute paths in Windows like 
C:\some\path, because it thinks it is an URL (as it contains a ":").

            CollectionReader reader = CollectionReaderFactory.createCollectionReader(
                    PdfReader.class,
                    PdfReader.PARAM_PATH, new File("data").getAbsolutePath(),
                    PdfReader.PARAM_PATTERNS, new String[] {"[+]*.pdf"},
                    PdfReader.PARAM_LANGUAGE, "de");

Exception in thread "main" 
org.apache.uima.resource.ResourceInitializationException
    at de.tudarmstadt.ukp.dkpro.core.api.io.ResourceCollectionReaderBase.initialize(ResourceCollectionReaderBase.java:191)
    at de.tudarmstadt.ukp.dkpro.core.io.pdf.PdfReader.initialize(PdfReader.java:76)
    at org.uimafit.component.CasCollectionReader_ImplBase.initialize(CasCollectionReader_ImplBase.java:53)
    at org.apache.uima.collection.CollectionReader_ImplBase.initialize(CollectionReader_ImplBase.java:71)
    at org.apache.uima.impl.CollectionReaderFactory_impl.produceResource(CollectionReaderFactory_impl.java:103)
    at org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(CompositeResourceFactory_impl.java:62)
    at org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:269)
    at org.apache.uima.UIMAFramework.produceCollectionReader(UIMAFramework.java:711)
    at org.uimafit.factory.CollectionReaderFactory.createCollectionReader(CollectionReaderFactory.java:196)
    at org.uimafit.factory.CollectionReaderFactory.createCollectionReader(CollectionReaderFactory.java:178)
    at org.uimafit.factory.CollectionReaderFactory.createCollectionReader(CollectionReaderFactory.java:123)
    at de.tudarmstadt.ukp.jek.pedocs.sandbox.RunSimplePipeline.main(RunSimplePipeline.java:38)
Caused by: java.io.FileNotFoundException: class path resource 
[C:/Users/JohnDoe/workspace/myproject/data/] cannot be resolved to URL because 
it does not exist
    at org.springframework.core.io.ClassPathResource.getURL(ClassPathResource.java:179)
    at org.springframework.core.io.AbstractResource.getURI(AbstractResource.java:93)
    at de.tudarmstadt.ukp.dkpro.core.api.io.ResourceCollectionReaderBase.getUri(ResourceCollectionReaderBase.java:363)
    at de.tudarmstadt.ukp.dkpro.core.api.io.ResourceCollectionReaderBase.scan(ResourceCollectionReaderBase.java:259)
    at de.tudarmstadt.ukp.dkpro.core.api.io.ResourceCollectionReaderBase.initialize(ResourceCollectionReaderBase.java:183)
    ... 11 more

Original issue reported on code.google.com by richard.eckart on 30 May 2012 at 1:29

GoogleCodeExporter commented 9 years ago
I tried to fix the problem by treating a location as a file if the colon not 
appear in second or later position in the location string. Can you please test 
if this fixes the problem on windows?

Original comment by richard.eckart on 5 Jun 2012 at 3:15

GoogleCodeExporter commented 9 years ago
I tried several versions:

1) The following version still yields the same error as above:

            CollectionReader reader = CollectionReaderFactory.createCollectionReader(
                    PdfReader.class,
                    PdfReader.PARAM_PATH, new File("C:/myPath/data").getAbsolutePath(),
                    PdfReader.PARAM_PATTERNS, new String[] {"[+]*.pdf"},
                    PdfReader.PARAM_LANGUAGE, "de");

            CollectionReader reader = CollectionReaderFactory.createCollectionReader(
                    PdfReader.class,
                    PdfReader.PARAM_PATH, new File("file:/C:/myPath/data/").getAbsolutePath(),
                    PdfReader.PARAM_PATTERNS, new String[] {"[+]*.pdf"},
                    PdfReader.PARAM_LANGUAGE, "de");

2) variation of the above that works:
PdfReader.PARAM_PATH, new File("C:/myPath/data").toURI().toURL().toString(),

System.out.println(new File("C:/myPath/data").toURI().toURL().toString());
yields the following Console output:
file:/C:/myPath/data/

3) variation: same error as in 1)
PdfReader.PARAM_PATH, new File("file:/C:/myPath/data").getAbsolutePath(),

4) variation: same error as as in 1)
PdfReader.PARAM_PATH, new File("file:/C:/myPath/data/").getAbsolutePath(),

Original comment by eckle.kohler on 10 Jun 2012 at 3:55

GoogleCodeExporter commented 9 years ago
Variations 3 and 4 must fail because when you create a File object, the path 
has to be a real path (e.g. "C:/myPath/data/" or "C:\myPath\data\") - it must 
not contain a "file:" prefix.

Do you still get the same error (Caused by: java.io.FileNotFoundException: 
class path resource [C:/myPath/data/] cannot be resolved to URL because it does 
not exist) for variation 1 or is it a different error message now, if so, which 
one?

Original comment by richard.eckart on 10 Jun 2012 at 4:48

GoogleCodeExporter commented 9 years ago
Thanks for the info regarding variations 3) and 4).

With version 1) I still get the very same error message pasted here:

Exception in thread "main" 
org.apache.uima.resource.ResourceInitializationException
    at de.tudarmstadt.ukp.dkpro.core.api.io.ResourceCollectionReaderBase.initialize(ResourceCollectionReaderBase.java:185)
    at de.tudarmstadt.ukp.dkpro.core.io.pdf.PdfReader.initialize(PdfReader.java:76)
    at org.uimafit.component.CasCollectionReader_ImplBase.initialize(CasCollectionReader_ImplBase.java:53)
    at org.apache.uima.collection.CollectionReader_ImplBase.initialize(CollectionReader_ImplBase.java:71)
    at org.apache.uima.impl.CollectionReaderFactory_impl.produceResource(CollectionReaderFactory_impl.java:103)
    at org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(CompositeResourceFactory_impl.java:62)
    at org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:269)
    at org.apache.uima.UIMAFramework.produceCollectionReader(UIMAFramework.java:711)
    at org.uimafit.factory.CollectionReaderFactory.createCollectionReader(CollectionReaderFactory.java:196)
    at org.uimafit.factory.CollectionReaderFactory.createCollectionReader(CollectionReaderFactory.java:178)
    at org.uimafit.factory.CollectionReaderFactory.createCollectionReader(CollectionReaderFactory.java:123)
    at de.tudarmstadt.ukp.jek.pedocs.sandbox.RunSimplePipeline.main(RunSimplePipeline.java:38)
Caused by: java.io.FileNotFoundException: class path resource 
[C:/Users/Eckle-Kohler/pedocs-volltexte/] cannot be resolved to URL because it 
does not exist
    at org.springframework.core.io.ClassPathResource.getURL(ClassPathResource.java:179)
    at org.springframework.core.io.AbstractResource.getURI(AbstractResource.java:93)
    at de.tudarmstadt.ukp.dkpro.core.api.io.ResourceCollectionReaderBase.getUri(ResourceCollectionReaderBase.java:387)
    at de.tudarmstadt.ukp.dkpro.core.api.io.ResourceCollectionReaderBase.scan(ResourceCollectionReaderBase.java:283)
    at de.tudarmstadt.ukp.dkpro.core.api.io.ResourceCollectionReaderBase.initialize(ResourceCollectionReaderBase.java:177)
    ... 11 more

Original comment by eckle.kohler on 10 Jun 2012 at 7:02

GoogleCodeExporter commented 9 years ago
Unfortunately I missed a case in the last fix which cause the fix to be 
completely without effect :( Could you please try again with revision 713? 

Original comment by richard.eckart on 10 Jun 2012 at 7:12

GoogleCodeExporter commented 9 years ago
now it works, thanks!

Original comment by eckle.kohler on 11 Jun 2012 at 5:55

GoogleCodeExporter commented 9 years ago

Original comment by richard.eckart on 11 Jun 2012 at 6:43