cilynx / pantomath

Pantomath knows about things
GNU General Public License v3.0
1 stars 0 forks source link

Concurrency #23

Open cilynx opened 2 years ago

cilynx commented 2 years ago

Scans with dozens of pages take forever to process and don't even start processing until the entire document has been scanned. We should pipeline this stuff so page 1 starts processing as soon as it shows up. If processing one page takes longer than the time to scan the next one, we should multi-process so we can start page 2 while page 1 is still finishing up. Probably want to do a capped thread pool to keep this from getting out of hand.

cilynx commented 2 years ago

Even worse, the GUI freezes up on scans after original.tiff drops if processed.pdf takes a long time to create. Under Gnome, this is throwing the wait-or-kill dialog.

cilynx commented 1 year ago

As we get into more advanced text recognition, we need to think about serial vs parallel processing. How should content on other pages in the same document impact interpretation of any given page? As an extreme example, think about tables that span multiple pages. More nuanced, context from other pages may improve recognition on any given page.