adding of two GT - Githubissues

HTR-United / htr-united

Ground Truth Resources for the HTR of patrimonial documents

https://htr-united.github.io

Creative Commons Zero v1.0 Universal

37 stars 31 forks source link

adding of two GT #57

Closed pstroe closed 2 years ago

pstroe commented 2 years ago

first: Gwalther HTR GT second: NZZ GT for printed newspapers

alix-tz commented 2 years ago

Hello, thank you again for your enthousiasm and these propositions. The "htr-united.yml" file in htr-united/ should only be modified via Github actions. To add your metadata, you need instead to create 1 file per dataset within a folder dedicated to the project(s), inside htr-united/catalog/.

For example the description you added for "Gwalther HTR GT" should go in a file named gwalther-htr-gt.yml inside a folder named bullinger-digital/ (I got this info from the Zenodo repo, maybe you don't want to mention the project?) inside catalog/.

If you are not sure how to do, you can:

see an example here
or use the form and click on the link provided at the end (it will automatically create the folder and the file, you'll just have to paste the description of your dataset in there)

Let me know if you need more explanation, and of course if you found it was not clear enough, tell us so we can improve the instructions! :)

PonteIneptique commented 2 years ago

I don't know if this is normal but the files seems to be empty. If you need some help, we can apply some changes ourselves :) I just recorded a demo video (no sound) if that can be helpful (we are still learning !)

https://www.youtube.com/watch?v=MiBLhj4cw_c

pstroe commented 2 years ago

thanks you all for replying to the pull request. I'm now on it correcting the issue. i think I just pasted the metadata in the wrong file. 10 minutes an we should be good to go. btw: the download feature does not work for me. it redirects me to the start of the page (this is on google chrome browser)

pstroe commented 2 years ago

i think now it should work. could you please try again?

PonteIneptique commented 2 years ago

There is a small formatting issue, description should look like this (we are fixing this in a current pull request to the form)

description: >
  This is ground truth for Rudolph Gwalther’s (1519-1586) handwriting taken from his book "Lateinische" Gedichte", where he accumulated writings between 1540 and 1580.
  Data collection and ground truth creation:  
  At the time we collected the data, we found 150 images with corresponding transcriptions by Peter Stotz on e-manuscripta (reference: Gwalther, Rudolf: Lateinische Gedichte. Zürich, 1540-1580. Zentralbibliothek Zürich, Ms D 152, https://doi.org/10.7891/e-manuscripta-26750 / Public Domain Mark) . We removed 8 images with too many corrections or vertical texts. Next, we uploaded the images into the Transkribus platform, applied the line recognition tool and manually copied the transcribed text lines into the recognised line boxes. During this process, we made some corrections, which were mainly due to inconsistencies in punctuation and capitalised letters.

PonteIneptique commented 2 years ago

The volume key is required. I took the liberty to run HUMGenerator on your data to get the numbers :)

PonteIneptique commented 2 years ago

Thanks and sorry for the multiple bugs :) We are gonna fix the form accordingly !