Optimization Suggestion about the Webpage /orc-pdf

Stirling-Tools / Stirling-PDF

#1 Locally hosted web application that allows you to perform various operations on PDF files

MIT License

40.28k stars 3.18k forks source link

Optimization Suggestion about the Webpage /orc-pdf #728

Open Status-Changer opened 7 months ago

Status-Changer commented 7 months ago

I downloaded many .traineddata files from the Tesseract website and added them to .../stirling/trainingData dir. It just simply lists all detected languages in alphabetical order, with no collapse button and no language priority configuration.

QQ图片20240122144109

Will you consider turning it into the next feature? Thanks.

sbplat commented 7 months ago

Do you mean ranking each language so there's a priority for the OCR? We use OCRmyPDF internally for this feature, and I don't think there's an option to rank them (the -l flag hints what languages it should search for).

If I misunderstood your question, please let me know!

Status-Changer commented 7 months ago

Do you mean ranking each language so there's a priority for the OCR? We use OCRmyPDF internally for this feature, and I don't think there's an option to rank them (the -l flag hints what languages it should search for).

If I misunderstood your question, please let me know!

No, I mean setting the priority just on the webpage for the user to choose. For example, if I use Japanese and English in most cases, then I want the list to be shown as

Japanese
English
... (Other languages)

instead of the current alphabetical version. And the priority can be set by user, such as sorting by number of uses.

Thanks for your reply.

Status-Changer commented 7 months ago

The number of languages used by each user is very limited, maybe this is not necessary for the system -.-

sbplat commented 7 months ago

Do you mean to obtain the default locale(s) from the browser and list them in that order? If so, #691 is similar to this and we could reuse that implementation here once that's added.