CopticScriptorium / corpora

Public repository for Coptic SCRIPTORIUM Corpora Releases
31 stars 13 forks source link

ISYE has no corpus metadata #38

Closed ctschroeder closed 4 years ago

ctschroeder commented 5 years ago

I was using ISYE today while conducting research for an article and noticed the corpus is missing corpus metadata. I think I know what happened: when we release an old corpus for the first time from the GitDox process, the corpus metadata isn't already in Gitdox and has to be entered anew into GitDox. It appears that ISYE was rereleased this fall (for the first time from GitDox I think?), and the corpus metadata hadn't been added into GitDox yet.

In the short term I believe the items we need to complete to fix this are:

Am I missing anything or getting anything wrong? Does anyone have the bandwidth for this soon?

We've had corpus metadata update quirks in the past, as well. Usually it's because I forgot to check corpus metadata, so I've been trying hard to keep close to the publication checklist and not just "wing it" before publication. That didn't work here, because it wasn't on my radar; ISYE was not on the list of corpora for Fall 2019 (#27) by the "freeze" date. So we need to add one more short term item:

For the longer term: I think the solution is keeping the release corpora list updated in GitHub as we work and then at publication time cross-checking it to ensure we are abiding by the freeze list. (Or agree upon a process to collectively deliberate on the reopening of the list of corpora after the freeze date). I believe it was @amir-zeldes suggestion to have a "freeze" date, and I enthusiastically endorse. It gets too chaotic at the end of the publication process to add on at the last minute. As Amir says, nothing in Coptic Studies is life or death so I'm fine with pushing anything not on the list to the next round.

If folks think it would be helpful, in addition I can modify our publication checklist to include the setting of the freeze date and cross-checking against the list in GitHub. I can also try to make a MarkDown version of our checklist so we can put it in the GitHub publication issue for easier reference. You all may have better suggestions, though, which I warmly welcome.

Thank you! Very sorry to be the bearer of bad news; something like this often seems to happen though, so I think just part of the process.

amir-zeldes commented 5 years ago

Thanks for spotting this - I think this is not too bad, and yes, with a big release little errors like this can slip through. This should be easy to fix.

I've added the metadata in GitDox, based on the previous version:

https://github.com/CopticScriptorium/corpora/blob/9ceebd576109890a8dc09e17ca94277befe5307c/shenoute-eagerness/shenoute.eagerness_ANNIS/corpus_annotation.annis#L619-L625

I've tentatively minted the version_n=3.0.1, though note that based on our document-change guidelines, all documents should remain 3.0.0. No annotators have been added, and eagerness has not been treebanked so far (though it is planned, at least for part).

I had a look ad one more corpus is affected: doc.papyri. It needed to be 're-released' (without substantive changes) since the GitHub version did not have the requisite files for the new CTS repo (it was a very old version where the files worked differently). I've added the metadata and I can release both as corpus version_n 3.0.1 - ping me if that is OK and I'll re-import in ANNIS/repo.

ctschroeder commented 5 years ago

Hi. Thanks so much. Yes, if the documents themselves have not been changed at all then yes the doc versions would be the same.

It does seem odd to change the version number for the corpora but not for the docs. Is there some alternative to the versioning that I'm not thinking of?

amir-zeldes commented 5 years ago

We could introduce some small edit into each corpus if it's really bothering you, but otherwise, since this change shows up only in ANNIS/PAULA, it's not going to be noticed too much (V3.0.1 will show up in ANNIS/PAULA corpus metadata, TT and TEI remain unchanged). Shall I go ahead?

ctschroeder commented 4 years ago

Oh I could have sworn I replied to this via email days ago. Absolutely no I did not mean to suggest we should edit the docs just for the sake of editing. So yes please go ahead. Thank you so much for taking care of this!

amir-zeldes commented 4 years ago

OK, it's imported and ingested. I did make some minor corrections anyway, as GF31-32 is now treebanked. Annotators are up to date as well. Take a look and let me know, if it all looks good I will mint a release 3.0.1 on GitHub.

ctschroeder commented 4 years ago

I think this looks good! Thanks a lot!

amir-zeldes commented 4 years ago

OK, release tag is done