HTR-United / htr-united

Ground Truth Resources for the HTR of patrimonial documents
https://htr-united.github.io
Creative Commons Zero v1.0 Universal
37 stars 31 forks source link

Add GT Dataset for Malayalam #104

Closed nidame closed 1 year ago

nidame commented 1 year ago

Hi, we have exported the data from Transkribus in the ALTO format only because the Page XML export from Transkribus produces invalid data. Transkribus has confirmed that the TranskribusMetadata node is not valid with regards to the original XML schema. I suspect this will cause problems when importing the Page XML in eScriptorium. How shall we handle this? Transkribus wants to solve this, but I have no information when.

alix-tz commented 1 year ago

This situation is linked to #60

Can you build the description of the dataset with the ALTO files, but leave the PAGE files accessible in your repository?

alix-tz commented 1 year ago

Maybe simply make sure to keep the organization you have currently:

- data/
   - alto/
   - page/
   - images files, ...
PonteIneptique commented 1 year ago

This entry is being currently reviewed by @alix-tz for the PR. Sorry for the delay !