Avisblatt / avisblatt

Avisblatt
4 stars 1 forks source link

prepare clean data repository #93

Open mbannert opened 1 year ago

mbannert commented 1 year ago
mbannert commented 1 year ago

@aengel17 this public repos looks good to me. Can you confirm these are the latest data? https://github.com/Avisblatt/avisdata

mbannert commented 1 year ago

Can we agree on a license for our first data release? https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/licensing-a-repository

I suggest CC-BY-SA-NC 4.0, that is creative commons share alike, mention source, non-commercial use. We could argue whether to leave out NC or not. In practice the difference is that data crawlers like datastream or statista may not add it to their pool, so their clients can get it. I don't see a scenario where they make money of avisblatt rather if at all that they help us distribute the data. They need to name the source anyway.

mbannert commented 1 year ago

Who needs to be part of the citation information?

Did I forget someone? Can you add people if necessary?

wissen-ist-acht commented 1 year ago

License seems good to me, I have no objection. If we are talking about the package, it's rather @LarsDIK than @LarsKury;I would not include Susanna.

aengel17 commented 1 year ago

License: CC-BY-SA-NC 4.0 defintely fine by me; regarding CC-BY-SA instead: the NC part may be important to the SNSF. We can find out, or just be cautious and stick with it.

Susanna: She didn't contribute any code line to the package, but as P.I. of the project that created the package my gut feel from a legal / moral perspective would be that she needs to be included, unless she declares that she doesn't. We can discuss this tomorrow with her as we have a meeting anyway.

Repo: It indeed contains our up-to-date data, BUT... preparing stuff for Peter we noticed some problems with 1734 (a lot of missing headers). And as we also have definite IDs for all but ~300 records in the dataset, I am currently preapring one more update of the dataset. I will push in that repo when finished and leave a note here

mbannert commented 1 year ago

package is a different animal. I would mention the package in the data README which goes to zenodo and potentially other archives that work with releases, too. This is basically about wrapping a first release of the data publication. I have added the a formally correct suggestion for a license as well as a formally correct .cff file. Plus I have added a simple, R backed .cff file generator.

Please add your ORCID, edit/remove persons as needed and close this issue. Once the issues is closed by y'all I'll setup a webhook (trigger) that updates zenodo every time someone with access to the avisblatt org adds a new RELEASE via GitHub. Please do not edit zenodo manually.

Re: Susanna. I definitely think we should include her for the dataset for the exact reasons @aengel17 mentions. If she does not fight it, I'd add her. For the package it's a different thing. This is more a CRAN / R thing and I might only get user questions or DH invitations her way. Hence I'd leave her out for the package.

@aengel17 We can always have another release, but I think it's a good idea to include this last correction into the very first official release.

aengel17 commented 1 year ago

Ah, thanks for clarification. I hadn't been reading thoroughly enough.

Regarding data publication, there are 2-3 other efforts in that direction - which in my view are not (really) competing with this, but [tomorrow] we should make sure Susanna agrees.

The other efforts:

(1) publication at www.e-newspaperarchives.ch, but this is more about presenting the Avisblatt as a periodical

(2a) archiving iiif and data structures in an invenio repo at hasdai.org (by DataFutures), this will take a small while but not many months to go public

(2b) a frontend on a server at the Basel History Department which allows to search the data, which will be synced with the Hasdai repo. Goes public in the next few weeks

-> (2a) is the sensitive part, obviously. Or hopefully it isn't. Let's just make sure.


ORCID Susanna Burghartz 0000-0001-6173-6626 Anna Reimann 0000-0001-8225-7851 Alexander Engel 0000-0002-8592-3124 Ina Serif 0000-0003-2419-4252 Lars Dickmann 0000-0002-4511-1017

mbannert commented 1 year ago

It would definitely be great to have a machine readable channel that works well with our avisblatt package 📦, too. Any updates here ?

aengel17 commented 1 year ago

We discussed it today; including Susanna in the data publication but not naming her as a contributor in the package is fine. License: CC-BY-SA-NC 4.0 for first version is fine.

But please hold off actually publishing the repo just yet...

aengel17 commented 1 year ago

Sorry, saw your post only after posting mine (patchy DB internet...).

Not sure what you mean by "machine readable channel that works well with our avisblatt package"?

mbannert commented 1 year ago

The goal is to give someone who starts working where our basic work finishes. I.e., do stuff like read_collection. Maybe even work mainly with the tags the avisblatt project created.

I doubt that reading the data into R will work as smoothly with any of the other archives. Plus, I see some danger the data night be available at all for the above use cases.