Norconex / crawlers

Norconex Crawlers (or spiders) are flexible web and filesystem crawlers for collecting, parsing, and manipulating data from the web or filesystem to various data repositories such as search engines.
https://opensource.norconex.com/crawlers
Apache License 2.0
183 stars 67 forks source link

Running multiple committers at the same time not possible? #283

Closed liar666 closed 8 years ago

liar666 commented 8 years ago

Hi,

I'm currently developing my own committer. In order to debug my own code, I wanted to keep the FileSystemCommitter , so that I can compare the output of both committer.

The configuration file looks like this:

    <crawler>
...
      <committer class="fr.presans.crawling.norconex.FoafCommitter">
        <where>STD_OUT</where>
      </committer>

      <committer class="com.norconex.committer.core.impl.FileSystemCommitter">
        <directory>./crawlers-output/startupnation/crawledFiles</directory>
      </committer>
    </crawler>

Unfortunately, when both committers are active, the second one (FileSystemCommitter) does not seem to be called: I can't find any ./crawlers-output/startupnation/crawledFiles/ directory nor files in it... However, when the first committer is commented-out, everything works fine.

Did I miss something in the configuration file or is there a bug?

PS: I'm using norconex-collector-http-2.5.1 + norconex-importer-2.6.0-SNAPSHOT.jar

essiembre commented 8 years ago

The configuration does not allow you to specify more than one committer, but fear not, a committer exists that just allows that: MultiCommitter. You can use it like this:

  <committer class="com.norconex.committer.core.impl.MultiCommitter">
      <committer class="(committer class)">
          (Commmitter-specific configuration here)
      </committer>
      <committer class="(committer class)">
          (Commmitter-specific configuration here)
      </committer>
      ...
  </committer>
liar666 commented 8 years ago

Oh! Yes of course! On top of that, I read about this feature recently! Sorry for the waste of time!

essiembre commented 8 years ago

Just to point there was an issue that sometimes committers besides the first one listed in a multi committer would not receive the content. This has been fixed in the latest snapshot release.

liar666 commented 8 years ago

I just noticed bug that is somewhat different but seems related: when I comment out one of the two committers I put in my MultiCommitter, then nothing happens. My bug is that when there's only one committer in a MultiCommitter, then it does not receive the content either... I'll check and tell you if the latest snapshot corrects the bug.

liar666 commented 8 years ago

OK. I just verified, the 2 bugs were indeed related, it's now corrected :)