OBF / FALDO

Feature Annotation Location Description Ontology
Creative Commons Zero v1.0 Universal
34 stars 11 forks source link

Why not distribute under CC0? #7

Closed hlapp closed 10 years ago

hlapp commented 11 years ago

You guys are borrowing from a lot of previous work - Genbank feature location syntax, BioPerl location model, etc just to name a few. So I'm not really sure what you want to genuinely claim copyright, i.e., intellectual property on.

Second, if someone reuses FALDO but fails to give you attribution, are you intending to enforce the CC-BY license in a court of law, or will you cry foul because of a violation of scientific norms?

If you can see my first point, or if your answer to the second one is the latter option, there's really no good reason left for not using CC0. This matters - some kinds of reuse, especially recursive reuse, make attribution of every source, if legally required, exceedingly difficult so as to amount to a barrier. You are demonstrating this yourself - you are not citing Genbank, or BioPerl, and you are not asking for those to be cited as well by those citing FALDO. Neither of which I'm criticizing - rather I'm suggesting you waive copyright and distribute under CC0.

KimJBaran commented 11 years ago

Jerven is probably the most suitable person to answer this. In the meantime, you might find my answer conclusive.

I think that you are confusing copyright and intellectual property (http://en.wikipedia.org/wiki/Intellectual_property#Copyright). They are not synonyms of each other and when you mention the license under which FALDO is distributed, then you are exclusively talking about copyright.

CC-BY seems like a good choice to me, because it encourages users to mention FALDO in their work, which will help spread the word about the ontology. I wrote "encourages" since we will quite obviously not start legal battles if someone forgets to attribute us.

CC0 is interesting in legal terms itself. Copyright is implicit and does not have to be explicitly stated. Waiving it as in CC0 appears to work in the US/UK, but there are other legal domains in which you cannot "uncopyright" your work to put it in the public domain (sorry, no link here). I do not find waiving copyright a good idea, because I think it is important to remain the right to choose an appropriate license for your created work yourself. Of course, this is a weak argument and certainly not a strong argumentation against moving from CC-BY to CC0.

Our intellectual property encompasses the OWL model of feature locations that is based on a wide range of input from experts in the field. We discussed various representations and came to a consensus that FALDO is suitable to express locations as they are used in Genbank, other genomic repositories, and tools. Whilst there might be a close relationship between FALDO and the BioPerl location model (not familiar with it), I highly doubt that BioPerl makes use of axioms to ensure data integrity and to infer knowledge. This distinction will be very clear if you query information with BioPerl and then compare how you can write a SPARQL query where the triple store uses inference rules to retrieve data that matches your query without the actual data having been stored in the store itself. In my opinion, the resulting ontology textual representation in OWL is something worth copyrighting -- under an open license.

To twist your argument around: what does CC-BY prevent you from doing with FALDO that CC0 would permit you to do?

Finally, we are actually citing work that FALDO is based upon in a manuscript that we are preparing. I doubt that we would stand a chance of pleasing the reviewers otherwise... :)

artob commented 11 years ago

@joejimbo note that according to Science Commons (a sister organization of Creative Commons) it is not in fact the case that all ontologies can necessarily even be copyrighted to begin with: http://sciencecommons.org/resources/readingroom/ontology-copyright-licensing-considerations/

Creative Commons and the Open Knowledge Foundation both recommend using CC0 for ontologies. If, as you say, you're not in any case going to sue somebody who forgets to attribute you, this seems like a no-brainer.

Speaking in my own experience at Dydra, one issue we not infrequently see with users relying on some ontology that isn't explicitly put into the public domain is that it muddles the copyright situation for users' own datasets. Consider, for instance, the scenario where a user has a hosted RDF repository into which they've loaded a number of ontologies plus their own datasets. If the ontologies have copyright claims (to the extent those may or may not even be valid), one would have to then apply copyright considerations on a statement-by-statement level, in some situations that require filtering out encumbered statements. While that can be done to an extent by abusing named graphs, that then complicates e.g. writing SPARQL queries over the full contents of the repository. Everything gets a lot simpler and clearer if ontologies can just be considered to be unencumbered and in the public domain.

hlapp commented 11 years ago

@joejimbo I'm not confusing copyright and intellectual property. Copyright needs to stand on solid enough ground to stand when challenged in court (claiming copyright does not automatically mean you legally have copyright).

Entangling legal instruments such as copyright with social norms such as citing and attributing one's sources is unfortunately quite common, especially among scientists trying to ensure they are attributed. But that doesn't make it a good practice - social norms aren't enforced by law or legal instruments but by social policing. Scientists are quite diligent on citing their sources not because of license requirements but because it is a norm violating of which has serious professional consequences. So using legal instruments to "encourage attribution" is misguided, and in fact dangerous - it promotes a view that public domain sources do not need to be attributed or cited, which is obviously false. For example, all data in Dryad are under CC0, yet there is an expressly stated and strong expectation that their use be attributed.

In the age of large-scale computing, Big Data, and the global web of data, full attribution can sometimes be challenging. Should someone who recombines 1000 data sets from Dryad have 1000 references in the article from that alone? Should someone who reuses that recombined dataset need to have 1001 references? What if FALDO is in Bioportal and I do a meta analysis over all ontologies in Bioportal. Do I need to have a citation for each of the 200 or so that are currently there? If I don't, should I be breaking a law? How about someone who reuses my meta analysis? Do you see where this goes? These are social challenges, and resolving them takes practices to converge to social consensus, not legal instruments or courts of law.

@bendiken thanks for another great example, and the great reference to Science Commons article.

KimJBaran commented 11 years ago

@bendiken There are different kinds of ontologies and FALDO is certainly not an ontology that "just" defines a vocabulary. I would be rather surprised if FALDO cannot be copyrighted even though we clearly made creative decisions when modelling the ontology.

@bendiken The example regarding datasets in Dydra is quite abstract. The datasets themselves will be available under some form of license and I do not see any reason why data providers would waive their copyright within the foreseeable future. However, it is a good "what if" scenario in which a CC0 licensed ontology would be beneficial.

@hlapp Publishing data in Dryad under CC0, but then requiring it to be attributed is confusing in my opinion. If attribution is necessary, then it should be stated so in a machine interpretable way by using CC-BY. That approach appears to be much more suitable for data mining purposes.

Extremely popular ontologies like the Gene Ontology and Sequence Ontology both use licensing requirements very similar to CC-BY (see http://www.geneontology.org/GO.cite.shtml and http://www.sequenceontology.org/resources/faq.html#lic). It does not appear to do them much harm.

I have no strong preference to side with either CC-BY or CC0 though.

It will be interesting to see what other FALDO contributors have to say on this issue, since we will not be able to go back to a CC-* license once we publish FALDO as CC0.

hlapp commented 11 years ago

@joejimbo have you read http://blog.datadryad.org/2011/10/05/why-does-dryad-use-cc0/

Treating CC-BY as a machine readable indication of whether citation is necessary or not is a really dangerous practice - it risks you violating scientific norms. Can you cite sources to back up such an interpretation? You certainly won't find CC behind that idea at all.

Yes some ontologies use CC-BY. This has been questioned before. There are also many people who put licenses on their data, even when those licenses would have no standing in court if anyone were to challenge them.

KimJBaran commented 11 years ago

@hlapp Thanks for the link. I have to say that Dryad and related discussions (for example by Heather Piwowar) are all concerning data. FALDO is an ontology and different licensing aspects might apply in that context.

hlapp commented 11 years ago

@joejimbo You aren't addressing the core arguments - license is inappropriate as an instrument for indicating the need for attribution, attribution stacking issues can become a barrier to reuse, and similar to data, "an ontology that draws entirely on facts or ideas in the public domain would not qualify for copyright protection" (Science Commons). How does it make a difference that FALDO is an ontology and not data? For example, in an RDF triple store, where, and based on which objective criteria, do you draw the line between what simply draws on facts of nature, and what involves enough creative expression to be copyrightable. Have you read http://sciencecommons.org/resources/readingroom/ontology-copyright-licensing-considerations/ as per @bendiken's comment? Can you state a clear case why FALDO does not only draw "entirely on facts or ideas in the public domain"? Can you state in the light of the issues and analysis above and those cited how CC-BY gives more benefits to FALDO than CC0 would?

ansell commented 11 years ago

The main current source of biological ontologies, OBO, generally follows principles which are fairly close to CC-BY-SA, although they add a condition (putting them outside of the OKF Open Definition) that the database identifiers must be changed in any copies of their ontologies.

The core arguments for the change are not actually legal, as CC0 is explicitly designed to take legal issues away to avoid actual legal impediments and rely on courtesy, as enforced by anonymous, knowledgable, reliable, peer reviewers. Whether people reusing FALDO in their database schemas would legally/actually need to cite the FALDO contributors in their publications are highly dependent on the local legislation, and that is not necessarily a bad thing.

The excessive reuse argument by Data Dryad with their hypothetical 50,000 federated datasets, doesn't refer to schemas, as most datasets in Data Dryad are presumably only sets of numbers directly sourced from experiments without analysis.

If someone reuses the FALDO schema in a product, under CC-BY, they must acknowledge the source. How many schemas will people be reusing, and how difficult is it for a computer to infer the necessary acknowledgements?

The Data Dryad arguments come from a non-RDF viewpoint where the community doesn't generally embrace that ability yet and their 50,000 datasets would be merged using some other schema that wasn't RDF. RDF makes it excessively easy (using rdfs:isDefinedBy | cc:license/dc:license/etc.) to promote an automated collection of attribution/reuse terms for a given piece of data, starting from the URI that was reused from the ontology or other RDF dataset.

Not wanting to have to acknowledge the source seems to go against the scientific norms, in apparent opposition to their conclusion that releasing people from the legal obligation by telling them they don't need to will actually enhance the norms due to the existence of an hypothetical "well-functioning community".

If the barrier to reuse argument is the main core of this discussion, then it may infact be archaic publishing conventions, based on outdated concepts about page numbers being important, that need to be changed. In the future when people can reference arbitrary numbers of ontologies/data items, without fear of going over their page limits, then the barrier to reuse argument is only left with those who do not, infact, want to acknowledge all of their peers contribution to their discoveries.

The current scientific norms that rely heavily on the last current contributions as the point of reference for acknowledgement for all of the contributions they refer to, without directly referring to them again, are not ideal in a computer friendly future. It is highly probable, though I haven't read any studies into it, that the current reference methods may not truly reflect the number of citations that ideas get, if a small addition to the original paper is used as the citation point instead of the substantial initial idea. This would not be an issue (and was not an issue in the past with courtesy agreements in relatively small communities, which is what Data Dryad/Panton Principles assume) except for the fact that researchers are effectively paid, IRL, based on their citation indexes. Anything we can do to improve those citation indexes will effectively improve the way researchers are financially reimbursed for their ideas based on their contributions to their fields.

In a computer friendly future, journals/scientists may not mind if authors do not insert text into articles for their citation list and instead have it as an optional downloadable appendix, reducing the reuse argument further again.

hlapp commented 11 years ago

@ansell I'm not sure I'm following your argument. But let me just clarify that nobody here is trying to evade having to acknowledge their sources, including FALDO or any other ontology. Also, it is worrying to me when you say that cc:license or dc:license properties would tell you anything about attribution requirements (as opposed to terms of reuse). Can you back up this interpretation with sources? Note that the fact that a CC-BY license requires attribution doesn't tell you how to attribute, and it's not that how to attribute doesn't matter.

Regarding ontologies, can we return to practical terms? There are a number of ontologies that are distributed under CC0, are published, and expect citation and attribution when reused whenever possible:

http://www.ontobee.org/browser/index.php?o=TAO http://www.ontobee.org/browser/index.php?o=VSAO http://www.ontobee.org/browser/index.php?o=TTO

I am curious about an argument why compared to those and compared to using CC0, FALDO warrants, needs, and benefits from a legal copyright assertion in the form of CC-BY. I still haven't seen that.

ansell commented 11 years ago

It would not be hard to extend the cc:license pattern to include a statement about how the attribution is required.

I think if the authors of the vocabulary would prefer to keep the legal obligation for attribution that there is no reason why CC0 should be forced upon them with the reasoning that all scientists are morally obliged to attribute on their publications, although not on their products.

hlapp commented 11 years ago

On Jul 13, 2013, at 11:38 PM, Peter Ansell wrote:

It would not be hard to extend the cc:license pattern to include a statement about how the attribution is required.

Perhaps. But it isn't at present, and I think this will require a lot more input to converge on a machine-readable standard. Or perhaps dcterms:bibliographicCitation would give us enough already of what we need. But surely dc:license or dc:rights is not synonymous with how to attribute and or even only whether. I think if the authors of the vocabulary would prefer to keep the legal obligation for attribution that there is no reason why CC0 should be forced upon them

I don't know where you get the sense of anything being forced upon anyone. I've made a suggestion, provided an argument for it, and have asked for a convincing counterargument and have asked corresponding questions. I provided links to others' analyses of the questions involved. I provided examples where CC0 is practiced already successfully for ontologies that are orders of magnitude larger than FALDO, and which involved several dozen domain experts to develop. I would hope this is not considered out of line with normal scientific practice and open development?

The background to this is that may have something to contribute to the project. However, I want to focus my contributions on resources that are most widely reusable with the least barriers. So my first contribution is to suggest changing to CC0. Again, I would hope this is not out of line with open development?

with the reasoning that all scientists are morally obliged to attribute on their publications, although not on their products

The expectation to attribute extends to products as much as publications. From where do you get the sense that the contrary is the case?

ansell commented 11 years ago

On 15 July 2013 01:26, Hilmar Lapp wrote:

On Jul 13, 2013, at 11:38 PM, Peter Ansell wrote:

It would not be hard to extend the cc:license pattern to include a statement about how the attribution is required.

Perhaps. But it isn't at present, and I think this will require a lot more input to converge on a machine-readable standard. Or perhaps dcterms:bibliographicCitation would give us enough already of what we need. But surely dc:license or dc:rights is not synonymous with how to attribute and or even only whether.

Given that CC-BY with its customisable attribution requirement is going to be staying around for a while in general, the RDF community will eventually develop ways of communicating the attribution method in machine-readable codes.

I think if the authors of the vocabulary would prefer to keep the legal obligation for attribution that there is no reason why CC0 should be forced upon them

I don't know where you get the sense of anything being forced upon anyone. I've made a suggestion, provided an argument for it, and have asked for a convincing counterargument and have asked corresponding questions. I provided links to others' analyses of the questions involved. I provided examples where CC0 is practiced already successfully for ontologies that are orders of magnitude larger than FALDO, and which involved several dozen domain experts to develop. I would hope this is not considered out of line with normal scientific practice and open development?

I generally use BSD or GPL licenses for software, depending on the situation. However, there are other communities where Public Domain/Unlicensed/CC0 is very popular. FALDO overlaps all of those communities currently, as there is nothing stopping an CC0 product from reusing a CC-BY product, and staying CC0, as long as the appropriate attribution is given.

The background to this is that may have something to contribute to the project. However, I want to focus my contributions on resources that are most widely reusable with the least barriers. So my first contribution is to suggest changing to CC0. Again, I would hope this is not out of line with open development?

It is definitely good to bring this up before making a contribution, and it isn't a bad time to bring it up given the development stage of FALDO.

with the reasoning that all scientists are morally obliged to attribute on their publications, although not on their products

The expectation to attribute extends to products as much as publications. From where do you get the sense that the contrary is the case?

Commercial products only attribute, in practice, where their lawyers tell them they must attribute.

peterjc commented 11 years ago

I am OK with using CC0 and the Scientific norms to enforce proper citation.

P.S. @hlapp We are citing BioPerl - but in any case the BioPerl/Biopython/.../BioSQL feature object models are all very much based on the INSDC (EMBL/GenBank/DDBJ) model, and we're citing that too: https://github.com/JervenBolleman/FALDO-paper

JervenBolleman commented 11 years ago

I am OK with CC0. FALDO is a bit tricky as it is both a standard, an implementation and code all at the same time. So looking around none of the current licenses really fit, but CC0 would allow everyone to use it. I propose that everyone who contributed with either a commit to the paper or to the ontology votes for status quo or CC0 in a comment to this issue saying "I vote CC-BY" or "I vote CC0"

JervenBolleman commented 11 years ago

I vote CC0

artob commented 11 years ago

Incidentally, someone recently pointed me at this relevant thread at the Open Data Stack Exchange:

http://opendata.stackexchange.com/questions/26/benefits-of-using-cc0-over-cc-by-for-data

KimJBaran commented 11 years ago

I vote CC-BY.

ktym commented 11 years ago

I vote CC0 (but CC-BY is also fine with me)

JervenBolleman commented 11 years ago

In the last 4 months there where 2 votes for CC0, 1 for CC-BY and 1 slight preference for CC0. I suspect that no other contributor else cares on way or an other.

If @joejimbo does not mind I will change to CC0 otherwise CC4-BY is nicer in regards to some of the problems reported by @hlapp.

KimJBaran commented 11 years ago

@JervenBolleman I do not mind. Please go ahead and change the license to CC0. It will be interesting to see the benefits and drawbacks of this licensing choice eventually.