Norconex / committer-sql

Implementation of Norconex Committer for SQL (JDBC) databases.
https://opensource.norconex.com/committers/sql/
Apache License 2.0
1 stars 6 forks source link

Running committer without running collector again #16

Open hardreddata opened 11 months ago

hardreddata commented 11 months ago

Hi,

I ran the collector and have the working folders here. The SQL commit failed as the database was down.

Is it possible to just re-run the committer part without rerunning the collector? The collector took quite a bit of time.

Many thanks.

sakanaosama commented 11 months ago

Hi,

If you're using version 3.x or later, here's what we can do:

  1. Enable "commitLeftoversOnInit" in the configuration (default is false). https://opensource.norconex.com/committers/sql/v3/apidocs/com/norconex/committer/sql/SQLCommitter.html
  2. Change maxDocuments to 0 to avoid fetching further new documents
  3. Find stored error indexes in the "error" directory, as shown below:
    workdir
    ...
    > queue
    > error
    >> batch-xxxxxxx
    >>> failed-index
  4. Move the error index to the "queue" folder:
    workdir
    > queue
    >> batch-xxxxxxx
    >>> failed-index
    > error
  5. Restart the crawler, retaining the previous crawling status (using the working directory). Creating a backup of the working folders and testing in a non-production environment is recommended.

Ryan Ng

hardreddata commented 11 months ago

Thanks for the advice.

I am still running the older 2.9.x version. If there is no solution here I will just crawl it again from the start over the holiday period.

sakanaosama commented 11 months ago

Regrettably, this feature is only accessible starting from version 3.x. It may be time to consider an upgrade. Also, I will proceed to close this ticket.

Thank you, Ryan Ng