HTR-United / htr-united

Ground Truth Resources for the HTR of patrimonial documents
https://htr-united.github.io
Creative Commons Zero v1.0 Universal
37 stars 31 forks source link

Simultaneously use ALTO and PAGE XML in a dataset? #60

Open alix-tz opened 2 years ago

alix-tz commented 2 years ago

I might consider doing this with LECTAUREP, but I wonder what would be the best approach and how this would impact documenting the volumes and the dataset.

For example, I could do 2 different folders (/data/alto and /data/page) but then how would I declare the format in htr-united.yml, and will it be possible to refine the volumes of files for each XM format (like files.alto = 100 and files.page = 100 in stead of files = 200)?

Other options could include:

I can't find any of these options really satisfaying. @PonteIneptique, do you have any opinion?

PonteIneptique commented 2 years ago

I can't find any of these options really satisfaying. @PonteIneptique, do you have any opinion?

My opinion would really be "Don't do it"...

PonteIneptique commented 2 years ago

I'll answer to the first part later :D