Norconex / committer-core

Norconex Committer is a java library and command line application used to route content to local or remote target repositories, such as a search engine index.
http://www.norconex.com/collectors/committer-core
Apache License 2.0
4 stars 10 forks source link

Ensure committer queue uniqueness to avoid queue collisions #9

Open essiembre opened 9 years ago

essiembre commented 9 years ago

Most Committers extend AbstractFileQueueCommitter. When multiple committers are used by multiple processes sharing the same working directory, the default queue directory can be the same. This results in two committers processing the same files. That's not ideal.

We should find a way to enforce uniqueness of committer queues, while having them predictable (so the same committer instance always point to the same location).

A real case for this issue is best described in https://github.com/Norconex/collector-http/issues/67.

When used with Norconex Collectors, implicitly passing the collector ID and crawler ID (which is already a unique combo) and using that to create a unique directory would do it, but Committers are not tied to Collectors right now, so we can't assume we'll always have these.

essiembre commented 3 years ago

Implemented in the upcoming V3 codebase.