HTR-United / htr-united

Ground Truth Resources for the HTR of patrimonial documents
https://htr-united.github.io
Creative Commons Zero v1.0 Universal
36 stars 31 forks source link

Create southasia-malayalam.yml #124

Closed PonteIneptique closed 1 year ago

PonteIneptique commented 1 year ago

Create Ground Truth Data for Printed Malayalam Fixes #104

alix-tz commented 1 year ago

Overall, it's all good, there only one issue with the format:

the dataset contains both ALTO and PAGE files. When I tried importing the data in eScriptorium, I get an error (The ALTO file should contain a Description/sourceImageInformation/fileName tag for matching.) on all the files. This is a Transkribus export error. Luckily, PAGE works fine, although I got a weird series of coordinates for a line or two.

Should we put Page-XML for the format?

PonteIneptique commented 1 year ago

Yeah sure :)

alix-tz commented 1 year ago

alright: ready for merge! :)