OCR4all / LAREX

A semi-automatic open-source tool for Layout Analysis and Region EXtraction on early printed books.
MIT License
177 stars 33 forks source link

virtual_keyboards #317

Open tboenig opened 2 years ago

tboenig commented 2 years ago

In

you provide a virtual keyboard. What is the meaning behind the division of the characters into individual lines?

bertsky commented 2 years ago

@tboenig it's just the visualization, i.e. the layout of letter buttons across lines

screenshot

chaddy314 commented 2 years ago

Characters in the virtual keyboard are grouped together by different characteristics. In the web-view this corresponds to a new line for each line the .txt-file (up to 12, if there are more a new line is started).

For the default keyboard this will result in sth like this: image

Of course it is possible to change the layout and add custom characters by changing the .txt-file and uploading it. You can also add additional character buttons and delete unneeded ones with the menu below.

tboenig commented 2 years ago

Okay, thank you very much for your answers. I have prepared something, see https://tboenig.github.io/keyboardGT/overview.html.

Would these keyboards complement LAREX? And are they in the right format? Example: https://tboenig.github.io/keyboardGT/keyboards/LAREX/LatExtD.txt

maxnth commented 2 years ago

Okay, thank you very much for your answers. I have prepared something, see https://tboenig.github.io/keyboardGT/overview.html.

Great resource, will certainly refer OCR4all / LAREX users who have questions about MUFI / Virtual Keyboards / etc. to this. Thank you!

Would these keyboards complement LAREX? And are they in the right format? Example: https://tboenig.github.io/keyboardGT/keyboards/LAREX/LatExtD.txt

More or less. LAREX currently excepts a single whitespace between the characters in a row. So the following file would work LatExtD.txt and look like the following in the LAREX UI

Screenshot of imported virtual keyboard

The current text format is a bit too unstructured for my taste and the parser isn't really robust but it works. We'll most likely add other import / export formats in the next major release of LAREX and I guess adding compatibility for the virtual keyboard files of other editors would be kinda neat as well.

We also have a repository which contains all virtual keyboards which are shipped per default with LAREX and we're open for PRs containing other virtual keyboard templates.

tboenig commented 2 years ago

The current text format is a bit too unstructured for my taste and the parser isn't really robust but it works.

With the other keyboard formats you have some inspiration for an optimized keyboard format. Does a PR make sense if your keyboard format is still changing?