Parallelize runs - Githubissues

apertium / apertium-regtest

Regression testing system for Apertium language data and translators

https://wiki.apertium.org/wiki/Apertium-regtest

GNU General Public License v3.0

1 stars 0 forks source link

Parallelize runs #36

Open TinoDidriksen opened 10 months ago

TinoDidriksen commented 10 months ago

A pipe often has a single bottleneck (usually a complex CG), so even though the pipe is multi-process, the benefit is reduced. Splitting the input in ~4 and running that many pipes can thus take full advantage of available CPUs.

This should both be per-corpus and across corpora, so runs should internally be changed to a single tasks list.

(quick'n'dirty per-corpus https://github.com/TinoDidriksen/regtest/commit/c18ed0cb1fcb69ee5e38d81d4886bb0f95c8afed)

mr-martian commented 10 months ago

I will also note that I think the current setup doesn't even take advantage of pipeline parallelization.