Languages should be sorted

UB-Mannheim / tesseract

Tesseract Open Source OCR Engine (main repository)

Apache License 2.0

3.16k stars 439 forks source link

Languages should be sorted #59

Open THausherr opened 2 years ago

THausherr commented 2 years ago

Environment

Tesseract Version: v5.0.1.20220107
Platform: W10.0.19043.1645 64 bit

Current Behavior:

grafik

Expected Behavior:

Languages should be sorted

Suggested Fix:

Sort

stweil commented 2 years ago

I think they are sorted by filename (deu.traineddata for German).

Ideally the list should use localized names ("Deutsch" for users who selected the German user interface) and sort those localized names. Do you want to implement that and send us a pull request?

THausherr commented 2 years ago

Sorry, no, not enough time, sadly.

stweil commented 2 years ago

Nor do I have enough time. Maybe someone else has an idea how this can be done with reasonable efforts.

Ruandv commented 2 years ago

@stweil I would like to TRY and give this a go. But I am looking for the files / list in the repo but does not seem to find it. Where is the code that generates the installer package?

Can you please give some pointers if possible?

stweil commented 2 years ago

That code is available here: https://github.com/UB-Mannheim/tesseract/tree/windows/nsis.

filak commented 1 year ago

The language selection is sorted by the Tesseract language code, but only the description is being displayed so it looks messy.

https://github.com/tesseract-ocr/tesseract/blob/main/doc/tesseract.1.asc#languages-and-scripts

 cos (Corsican), cym (Welsh), dan (Danish), deu (German), div (Dhivehi)

https://github.com/UB-Mannheim/tesseract/blob/windows/nsis/tesseract.nsi

 Section /o "German" SecLang_deu

Maybe a quick fix would do ?

 Section /o "deu - German" SecLang_deu