Norconex / collector-core

Collector-related code shared between different collector implementations
http://www.norconex.com/collectors/collector-core/
Apache License 2.0
7 stars 15 forks source link

Introduce "maxParallelCrawlers" option in collector-config #25

Closed martin-huber closed 5 years ago

martin-huber commented 5 years ago

… in order to configure size of threadpool that is used for parallel crawler jobs: pass it from AbstractCollector to already existing 2nd AsyncJobGroup constructor

We are using one collector with over 300 crawlers and are observing OutOfMemoryErrors if all are running in parallel alltogether (what currently is the only option).

Using this configuration option allows for a finer control over the consumed resources. The behaviour is backward compatible: if the option is ommited, it is still all crawlers that are executed in parallel.

essiembre commented 5 years ago

Merged, and also now part of the latest snapshot of Collector Core as well as HTTP and Filesystem Collectors.

Many thanks for your contribution!

martin-huber commented 5 years ago

Many thanks to you! In the next days another change request (concerning SSL) will follow ...

Cheers, Martin

------ Originalnachricht ------ Von: "Pascal Essiembre" notifications@github.com An: "Norconex/collector-core" collector-core@noreply.github.com Cc: "Martin Huber" martin.huber@gmx.de; "Author" author@noreply.github.com Gesendet: 12.03.2019 05:37:37 Betreff: Re: [Norconex/collector-core] Introduce "maxParallelCrawlers" option in collector-config (#25)

Merged, and also now part of the latest snapshot of Collector Core as well as HTTP and Filesystem Collectors.

Many thanks for your contribution!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Norconex/collector-core/pull/25#issuecomment-471854255, or mute the thread https://github.com/notifications/unsubscribe-auth/AClJn6vxnCgzdXEVjNMmMprJUrCxqO--ks5vVy8RgaJpZM4bhJ7k.