benwbrum / fromthepage

FromThePage is a wiki-like application for crowdsourcing transcription of handwritten documents.
http://fromthepage.com
GNU Affero General Public License v3.0
168 stars 50 forks source link

PDF import with not text layer, but text import checked, leads to "correction" instead of "transcription" #4174

Open saracarl opened 6 days ago

saracarl commented 6 days ago

If folks import a PDF and check the "import text from the text layer" option, we set the project as a "correction" project, even if there is no text. We shouldn't. Can we check for text (or make sure we don't input "blank" text) before setting that?

Here's an example: https://fromthepage.com/clerkchaz/rockingham-county-minute-books

The UI ends up saying "correct" everywhere instead of "transcribe"

benwbrum commented 6 days ago

To fix this, we'll need to add code to the ingestor rake task to test whether the extracted text files have any text in them, then change the OCR flag for the work conversion.