divvun / CorpusTools

Tools to manage and convert GiellaLT corpus files
https://giellalt.github.io/CorpusTools/
GNU General Public License v3.0
3 stars 0 forks source link

Parallelization has changed (for the worse) after move to Git #8

Closed th0masbk closed 10 months ago

th0masbk commented 11 months ago

Earlier, if the parallelization went bad for some sentences, it used to be corrected later in the text. Now, however, it seems that there are no attempts to parallelize after the first one: If there is one error, the rest of the document is also out of sync.

This can be seen by reparallelizing any old file. The old parallelizations have inaccuracies in some sentences but then corrects themselves later in the text. The new ones, on the other hand, are out of sync after an error occurs.

Example text with old and new parallelization: Old commit (from March): https://github.com/giellalt/corpus-nob/commit/6a4c3f4590a36647a211f81a1c2b208881abb0a2

New commit (today): https://github.com/giellalt/corpus-nob/commit/8713bd402586f43cee487b105539561cf9f414f0