Norconex / collector-filesystem

Norconex Filesystem Collector is a flexible crawler for collecting, parsing, and manipulating data ranging from local hard drives to network locations into various data repositories such as search engines.
http://www.norconex.com/collectors/collector-filesystem/
22 stars 13 forks source link

Could not retreive SMB ACL data - ver 2.9.0 Snapshot #49

Closed truezjz closed 4 years ago

truezjz commented 5 years ago

Got this error when the start path is network drive, eg. \\cap-index\c$\WISD\

using Branch: 2.9.0 snapshot.

FYI, this error not found in 2.8.0

FilesystemCrawler: 2019-05-17 11:25:55 ERROR - Could not retreive SMB ACL data.
java.nio.file.NoSuchFileException: \WISD\AWitham@washtenawisd.org_Export\04.07.2018-1143AM\Exchange\Awitham@washtenawisd.org.pst
    at sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:79)
    at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97)
    at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:102)
    at sun.nio.fs.WindowsLinkSupport.getFinalPath(WindowsLinkSupport.java:107)
    at sun.nio.fs.WindowsAclFileAttributeView.getOwner(WindowsAclFileAttributeView.java:120)
    at com.norconex.collector.fs.fetch.impl.SpecificLocalFileFetcher.fetchAcl(SpecificLocalFileFetcher.java:73)
    at com.norconex.collector.fs.fetch.impl.SpecificLocalFileFetcher.fetchFileSpecificMeta(SpecificLocalFileFetcher.java:55)
    at com.norconex.collector.fs.fetch.impl.GenericFileMetadataFetcher.fetchMetadada(GenericFileMetadataFetcher.java:75)
    at com.norconex.collector.fs.pipeline.importer.FileImporterPipeline$FileMetadataFetcherStage.executeStage(FileImporterPipeline.java:146)
    at com.norconex.collector.fs.pipeline.importer.AbstractImporterStage.execute(AbstractImporterStage.java:31)
    at com.norconex.collector.fs.pipeline.importer.AbstractImporterStage.execute(AbstractImporterStage.java:24)
    at com.norconex.commons.lang.pipeline.Pipeline.execute(Pipeline.java:91)
    at com.norconex.collector.fs.crawler.FilesystemCrawler.executeImporterPipeline(FilesystemCrawler.java:228)
    at com.norconex.collector.core.crawler.AbstractCrawler.processNextQueuedCrawlData(AbstractCrawler.java:538)
    at com.norconex.collector.core.crawler.AbstractCrawler.processNextReference(AbstractCrawler.java:419)
    at com.norconex.collector.core.crawler.AbstractCrawler$ProcessReferencesRunnable.run(AbstractCrawler.java:820)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
somphouang commented 5 years ago

If possible, can you confirm the following?

  1. If you have configured in your configuration to include the crawlers config, the optionsProvider?

For example, below:

<crawler id="my example crawl">
...
<optionsProvider class="com.norconex.collector.fs.option.impl.GenericFilesystemOptionsProvider">
          <!-- Authentication (any file system) -->
        <authDomain>YOURDOMAIN</authDomain>
        <authUsername>YOURUSER</authUsername>
        <authPassword>YOURENCRYPTEDPASSWORD</authPassword>
        <!-- Use the following if password is encrypted. -->
        <authPasswordKey>YOURKEY_HERE</authPasswordKey>
        <authPasswordKeySource>key</authPasswordKeySource>
 </optionsProvider>  
...
</crawler>

See more details at: See, https://www.norconex.com/collectors/collector-filesystem/configuration#cfg-optionsProvider

  1. It's possible, you do not have access to the you configured, are you able to confirmed that you have access as well by the user that launched the " shell? Hinting from "NoSuchFileException" in your error message.

  2. Have you included the "lib" file as mention in the instruction at section "SMB/CIFS Support" from page: https://www.norconex.com/collectors/collector-filesystem/getting-started

truezjz commented 5 years ago

yes, I can access that network drive,

FYI, if I use V2.8.0 , I didn't see this error.

somphouang commented 5 years ago

Would the #3 be done as well? Putting the jar file from http://central.maven.org/maven2/jcifs/jcifs/1.3.17/jcifs-1.3.17.jar in the collector's lib folder?

truezjz commented 5 years ago

I have the jcifs-1.3.17.jar in place, still the same error

essiembre commented 5 years ago

@truezjz, I just noticed you are not specifying the protocol (from the error you provided). If your network drive is mapped to a local drive (e.g. "S:\") then it should work without specifying the protocol. Otherwise, for JCIFS/Samba use, you have to add smb:, like this:

smb://WISD/AWitham@washtenawisd.org_Export/04.07.2018-1143AM/Exchange/Awitham@washtenawisd.org.pst

If you add the protocol but still get the error, please share your config.

dtcyad1 commented 5 years ago

Hi Pascal, I am having the same issue too. I have the default installation on a linux env(on a Mac). Version 2.8.0 works great. But the latest version 2.9.0 fails. I am running the default shell script in both the cases using the 2 sample files from the files folder. With version 2.9.0, I am getting the following error. I have copied the jcif jar to the lib folder and tried putting the smb:// protocol that errored out immediately, so not sure where the protocol goes. I also have access to the files (since i was the one that put them there!!). I have also added this to the sample config file:

joe 1234

Sample Crawler: 2019-11-12 15:09:25 INFO - Sample Crawler: Could not process document: file:///Users/joe/temp/norconex-collector-filesystem-2.9.0-SNAPSHOT/examples/files/crawlme.html (null) java.lang.NullPointerException at com.norconex.collector.fs.fetch.impl.SpecificLocalFileFetcher.fetchAcl(SpecificLocalFileFetcher.java:73) at com.norconex.collector.fs.fetch.impl.SpecificLocalFileFetcher.fetchFileSpecificMeta(SpecificLocalFileFetcher.java:55) at com.norconex.collector.fs.fetch.impl.GenericFileMetadataFetcher.fetchMetadada(GenericFileMetadataFetcher.java:75) at com.norconex.collector.fs.pipeline.importer.FileImporterPipeline$FileMetadataFetcherStage.executeStage(FileImporterPipeline.java:153) at com.norconex.collector.fs.pipeline.importer.AbstractImporterStage.execute(AbstractImporterStage.java:31) at com.norconex.collector.fs.pipeline.importer.AbstractImporterStage.execute(AbstractImporterStage.java:24) at com.norconex.commons.lang.pipeline.Pipeline.execute(Pipeline.java:91) at com.norconex.collector.fs.crawler.FilesystemCrawler.executeImporterPipeline(FilesystemCrawler.java:228) at com.norconex.collector.core.crawler.AbstractCrawler.processNextQueuedCrawlData(AbstractCrawler.java:538) at com.norconex.collector.core.crawler.AbstractCrawler.processNextReference(AbstractCrawler.java:419) at com.norconex.collector.core.crawler.AbstractCrawler$ProcessReferencesRunnable.run(AbstractCrawler.java:820) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)

Any feedback for this - I don't know what I am missing!! Thanks

essiembre commented 4 years ago

It appears that for files with no ACL a null value was being returned. Can you please try the new snapshot that was just made?

dtcyad1 commented 4 years ago

Hi Pascal,

Thanks for the quick fix - it works great now!!

Thanks