Open fnielsen opened 5 years ago
@mjpost this is something that the ACL board has to decide somehow, right? When ignoring the abstracts, the metadata seems to be a pure collection of facts and as such would not fall under copyright but at most under database rights (but IANAL).
Explicitly releasing the data without the abstracts under CC0 would clarify the situation; releasing the abstracts under CC0 would be problematic as authors could in no way foresee that a part of their paper would essentially be in the public domain.
Therefore, my proposal would be to have a one-liner which strips the abstracts from the mods XML and the result is officially placed under CC0 by the ACL Executive Committee.
Yes, the copyright situation is a bit of a mess. This needs to be worked out at the ACL Exec level but things move slowly there because there is too much work and it's all done by volunteers.
IANAL either but agree with Arne that this is sensible, but I cannot produce a definitive statement. I also agree that the abstracts are a bit murkier of a situation.
I've made a note to bring this up as a primary issue at the next Exec+ meeting.
For Wikidata the license of the abstract is irrelevant as we do not represent it there. So it is only the other metadata. And possibly (in the longer run) also the citations, cf. Initiative for Open Citations https://i4oc.org/.
@mjpost Did you forward this question?
I did but it got bumped due to other issues.
Just to note here that there has been a bit of behind-the-scenes progress on copyright issues, and I hope to have this addressed by the end of the year.
I am wondering whether you have come to any further clarification?
https://github.com/WolfgangFahl/SemPubFlow is a new project that might offer tooling
If the ACL item has a DOI then the metadata is available in CrossRef. Wikidata people are scraping the metadata for DOI items, so already now some of the ACL papers are automatically scraped.
In February 2019, I was in contact with ACL Anthology people (mostly @mjpost) about Wikidata integration. This was also a discussion in a Scholia issue https://github.com/fnielsen/scholia/issues/227
I now see the fine development of the ACL Anthology Web site with machine-readable data in MODS XML and I was wondering whether we are allowed to copy the metadata from ACL Anthology into Wikidata. Wikidata is CC0 and the ACL Anthology states CC BY or CC BY-NC-SA for the "materials". It is unclear whether author, title, year of publication and similar metadata are under that license or it is only the papers themselves that are under the Creative Commons license and that it that the metadata can be freely copied to Wikidata.
It would be nice if the FAQ could clarify what precisely what is meant by "materials" and their copyright.
Currently, Wikidata has fairly limited metadata from ACL. The data has mostly been setup manually, see, e.g., https://tools.wmflabs.org/scholia/publisher/Q4346375