HTR-United / htr-united

Ground Truth Resources for the HTR of patrimonial documents
https://htr-united.github.io
Creative Commons Zero v1.0 Universal
36 stars 31 forks source link

Data without images #138

Closed michaelscho closed 4 months ago

michaelscho commented 5 months ago

Hi, I have quite a large set of annotated pageXML files containing layout information (~3000 pages). I want to make the data available via htr-united, but I am not sure if I can include images for copyright reasons. For my own repository, I create a json file with links to the images which can be used for downloading the images directly from the library. Will that be sufficient for htr-united as well or what is best practice in these cases?

alix-tz commented 5 months ago

Hello! Thank you for reaching out! Yes, if you can't publish the images along with the dataset but provide a way for potential reusers to download the images files on their own, this is very acceptable! Simply, make sure to mention it in the description of the dataset.

For us, all you need to do is fill the form (here: https://htr-united.github.io/document-your-data.html) and submit it to us for addition in the catalog.

Don't hesitate if you have other questions!

michaelscho commented 4 months ago

Thanks for you reply! It took me a bit to proceed in this matter but I now created json mappings such as { "file_name": "B_0001.xml", "image_name": "B_0001.jpg", "image_url": "https://api.digitale-sammlungen.de/iiif/image/v2/bsb00140701_00001/full/1500,1840/0/default.jpg" }, I think that should do the trick. Thanks again!

PonteIneptique commented 4 months ago

Great ! :)