acl-org / acl-anthology

Data and software for building the ACL Anthology.
https://aclanthology.org
Apache License 2.0
426 stars 283 forks source link

Clarify the copyright of ACL Anthology metadata #580

Open fnielsen opened 5 years ago

fnielsen commented 5 years ago

In February 2019, I was in contact with ACL Anthology people (mostly @mjpost) about Wikidata integration. This was also a discussion in a Scholia issue https://github.com/fnielsen/scholia/issues/227

I now see the fine development of the ACL Anthology Web site with machine-readable data in MODS XML and I was wondering whether we are allowed to copy the metadata from ACL Anthology into Wikidata. Wikidata is CC0 and the ACL Anthology states CC BY or CC BY-NC-SA for the "materials". It is unclear whether author, title, year of publication and similar metadata are under that license or it is only the papers themselves that are under the Creative Commons license and that it that the metadata can be freely copied to Wikidata.

It would be nice if the FAQ could clarify what precisely what is meant by "materials" and their copyright.

Currently, Wikidata has fairly limited metadata from ACL. The data has mostly been setup manually, see, e.g., https://tools.wmflabs.org/scholia/publisher/Q4346375

akoehn commented 5 years ago

@mjpost this is something that the ACL board has to decide somehow, right? When ignoring the abstracts, the metadata seems to be a pure collection of facts and as such would not fall under copyright but at most under database rights (but IANAL).

Explicitly releasing the data without the abstracts under CC0 would clarify the situation; releasing the abstracts under CC0 would be problematic as authors could in no way foresee that a part of their paper would essentially be in the public domain.

Therefore, my proposal would be to have a one-liner which strips the abstracts from the mods XML and the result is officially placed under CC0 by the ACL Executive Committee.

mjpost commented 5 years ago

Yes, the copyright situation is a bit of a mess. This needs to be worked out at the ACL Exec level but things move slowly there because there is too much work and it's all done by volunteers.

IANAL either but agree with Arne that this is sensible, but I cannot produce a definitive statement. I also agree that the abstracts are a bit murkier of a situation.

I've made a note to bring this up as a primary issue at the next Exec+ meeting.

fnielsen commented 5 years ago

For Wikidata the license of the abstract is irrelevant as we do not represent it there. So it is only the other metadata. And possibly (in the longer run) also the citations, cf. Initiative for Open Citations https://i4oc.org/.

akoehn commented 4 years ago

@mjpost Did you forward this question?

mjpost commented 4 years ago

I did but it got bumped due to other issues.

mjpost commented 4 years ago

Just to note here that there has been a bit of behind-the-scenes progress on copyright issues, and I hope to have this addressed by the end of the year.

fnielsen commented 3 years ago

I am wondering whether you have come to any further clarification?

WolfgangFahl commented 1 year ago

https://github.com/WolfgangFahl/SemPubFlow is a new project that might offer tooling

fnielsen commented 1 year ago

If the ACL item has a DOI then the metadata is available in CrossRef. Wikidata people are scraping the metadata for DOI items, so already now some of the ACL papers are automatically scraped.