Closed ALT-MOROSO closed 1 month ago
Hi @ALT-MOROSO,
In order to speedup indexing you can use the --parallelism
option documented here. It's also available in the settings for local mode.
Concerning how to speed up tesseract part, the answer is not straighforward.
Tesseract has multithreading capabilities as it can leverage OpenMP, however it's not easy to undertand if leveraging multithreading will benefit performance or not.
According to https://github.com/tesseract-ocr/tesseract/issues/3744 it's not clear if enabling multithreading will speed things up.
On the contrary, the tessdoc suggest that you could use the OMP_THREAD_LIMIT
(and probably OMP_NUM_THREADS
) env vars to use more threads and speed things up. Beware that adding more thread often has an overhead and at some point will become slower than running things with less threads.
So to sum up:
--parallelism
flagOMP_THREAD_LIMIT
and OMP_NUM_THREADS
(with no guaranty)Thank you very much !!
This issue is stale because it has been open for 40 days with no activity.
Hello ICIJ'team !
I have two question/problems today :
-When working on DATASHARE, both desktop and server mode, I figured out that Tesseract 's OCR is programmed to analyse 4 files per second. Do you know how to improve this limit ?
Thank you very much !!! Have a good one ! 👍