danielskatz / software-vs-data

understanding and documenting the differences between software and data in the context of citation
Creative Commons Attribution 4.0 International
33 stars 10 forks source link

"Software is a creative work, data a fact" #14

Closed BergFulton closed 7 years ago

BergFulton commented 8 years ago

My issue with this is that data, data collection, and data curation is highly biased, and is a human activity. Data doesn't just appear, it needs the intervention, and sometimes interpretation and collation, of humans. And any time you introduce humans, you introduce a "reading" of the data.

I work on issues related to art provenance (ownership history). We have TONS of data in the form of auction catalogues, catalog raisonnes, letters, archival information, photographs, etc.. While it may state a fact that fact may or may not contain the full nuance of the actual full truth of the object because it relies on data that has been recorded, re-recorded, and transferred down through the ages in some way, shape, or form.

My disagreement isn't the statement, because data and data collection largely isn't a "creative" task, but rather, I think the act of human element of data needs to be acknowledged?

mbjones commented 8 years ago

@BergFulton This is an interesting point, and I see where you are coming from. From my perspective, you are referring to measurement error, which can come in many forms, including systematic, human biases. Sometimes data are imprecise, inaccurate, and biased. However, just because there is a bias or error, data are still not creative works per se, and thus are not subject to copyright in many jurisdictions. Once one crosses that line where creativity is involved in making something, one transitions from data to fictional works. Data are inherently empirical, and are intended to be an estimation of a property of a thing or phenomenon. I think the original incantation that "Software is a creative work, data a fact" still holds.

danielskatz commented 8 years ago

@BergFulton do you have any specific changes to suggest? Otherwise, I will close this (while acknowledging your point, but being unsure what to do about it)

thinrope commented 8 years ago

May be "raw data" is not a creative work per se, but collecting it (designing the process, implementing the process with tools and actually recording it) can often be really a lot more creative. Whether software is used as a tool or not. Think of any famous physics experiment (e.g. measuring the speed of light). So experimental raw data can (did) exist way before any software to record it.

Another aspect of data is (in software terms) settings and configuration of software. Optimizing those to be able to use certain software to process (other) data is often an art (think of individual settings for each filter and order of application in an image editor to improve a photo). Now if you extrapolate on "art of choosing the right settings (=data)", one can end with either: a) algorithm (how to optimize settings, how to use other software to process data) b) experimentally found values for some software (e.g. q-values for encoding video for some device)

Are visualizations of raw data, still data? Or a creative use of data?

I guess talking only about facts as opposed to data may bring some help (but then, the height of Mt. Everest is not a fact, neither a datum, it is changing btw...) IANAL, so not sure what you can copyright (and I sick to OSS), but I agree that some data may need protection occasionally, as opposed to the method of obtaining it. Those data usually are kept secret though, as opposed to copyright. The usual examples are drug formula, oil/fuel additives, Coca-Cola formula, precise orbits of GPS satellites and so on.

I'd suggest to put "raw" before data and to try limiting the scope of discussion.

danielskatz commented 8 years ago

thanks @thinrope