HTR-United / htr-united

Ground Truth Resources for the HTR of patrimonial documents
https://htr-united.github.io
Creative Commons Zero v1.0 Universal
37 stars 31 forks source link

Create dataset-for-late-medieval-castilian-text-recognition.yml #92

Closed matgille closed 1 year ago

matgille commented 1 year ago

I'm adding my phd dissertation corpus here (in-domain and out-of-domain)

= )

matgille commented 1 year ago

This dataset contains the incunabula dataset I've already published here some time ago. I'm removing it.

matgille commented 1 year ago

I'm waiting for the acceptation and the final version of the paper to publish the data on htr united.

PonteIneptique commented 1 year ago

:'(

matgille commented 1 year ago

The v1 has been corrected, the v2 is being reviewed, and the paper will be published soon (hopefully). Let's add the corpus to the catalog if the metadata is OK = )

PonteIneptique commented 1 year ago

ISO goth language is meant for something else than Latin gothic script, see https://scriptsource.org/cms/scripts/page.php?item_id=script_detail&key=Goth

The word “Gothic” is used in the context of writing systems to describe three very different, unrelated, styles of writing:

  • the Gothic alphabet: historically written in an uncial style, and invented for the Gothic language;
  • the Visigothic script: a minuscule script of the Latin alphabet, used in Iberia to write Latin text when Iberia was ruled by the Visigoths;
  • “gothic” scripts: a popular name for many blackletter scripts of the Latin alphabet (including Fraktur); these have been used for writing many languages.

This page refers only to the first sense - the Gothic alphabet, now extinct. This script is thought to have been invented around 350AD by the bishop Ulfilas for the purpose of translating the Bible into the Gothic language. The runic script which had previously been used for writing this language had strong associations with Germanic paganism, so was not deemed appropriate for this purpose. The Gothic language continued to be spoken until the 17th century, but there is no record of the script being used beyond the 6th century.

PonteIneptique commented 1 year ago

@alix-tz We need to decide on this one what to do. This is the first deletion of a previous dataset. I am a bit skeptical about removing a previously registered dataset personally, but I'd like to have your opinion.

alix-tz commented 1 year ago

@alix-tz We need to decide on this one what to do. This is the first deletion of a previous dataset. I am a bit skeptical about removing a previously registered dataset personally, but I'd like to have your opinion.

In deed, it opens a pandora box, but at the same time, I understand that some situations may justify it (like similar to retracting a publication).

@matgille I'm not sure I understand why you removed the previous dataset. Was it because the dataset is contained in the newer dataset? Can you explain?

matgille commented 1 year ago

Was it because the dataset is contained in the newer dataset? Can you explain?

Yes, that's it...

What I could do is modify the metadata of the older dataset, remove the information about print in the newer and make both point to the same repo ? The problem would be the description wouldn't be exact. Idk.

PonteIneptique commented 1 year ago

Nope, I think we need a better way to deal with this. We'll talk about with Alix off GitHub because this is a larger problem.

On Tue, 25 Apr 2023, 4:44 pm Matthias Gille Levenson, < @.***> wrote:

Was it because the dataset is contained in the newer dataset? Can you explain?

Yes, that's it...

What I could do is modify the metadata of the print dataset, remove any print information in the second dataset and make both point to the same repo ?

— Reply to this email directly, view it on GitHub https://github.com/HTR-United/htr-united/pull/92#issuecomment-1521915223, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOXEZW4QMAHWAEGP3RAZBDXC7PLJANCNFSM6AAAAAASULEN3E . You are receiving this because you were assigned.Message ID: @.***>

PonteIneptique commented 1 year ago

The catalog being versioned, we are gonna include this change. We'll have to show the version of the catalog so that it is really used...