emhart / 10-simple-rules-data-storage

A repository for the 10 simple rules data sharing paper to be submitted to PLoS Comp Biology
Creative Commons Zero v1.0 Universal
25 stars 13 forks source link

What's the opposite of "machine-readable"? (Rule 5) #109

Closed khinsen closed 8 years ago

khinsen commented 8 years ago

The term "machine-readable" in rule 5 isn't very clear in my opinion. Any computer file is machine readable by definition. The opposite of machine-readable would thus be data stored on printed paper, but that's not the message of rule 5.

In technical terms, the topic of rule 5 is how easy a data format is to parse. The extreme end of the spectrum is data that is impossible to parse because there is no formal data format at all.

Given the level of this paper, I understand that the term "parsing" should better be avoided, but that makes it difficult to be precise. One possibility is being vague in the title ("Data should be easy to process by software") and give both "good" and "bad" examples in the text. The archetype of the bad example is data embedded in prose stored in a Word or PDF file (yes, I have seen that).

dlebauer commented 8 years ago

Very good points. One of the early incarnations of this rule (#12) "store data in common open format", but "easy to process by software / widely supported" are also important features

khinsen commented 8 years ago

Common and widely supported is not the same as easy to process. An example is the PDB format for macromolecular structures, which is common and widely supported, but such a mess that processing it correctly is a huge effort. In fact, almost no software processes PDB files correctly according to the specification.

In such situations, my personal advice is to go with a simple and clear format, even if it is not the most popular one. Otherwise the community remains stuck with a bad format forever. I know many people do not agree with this point of view.

fmichonneau commented 8 years ago

I see the point that @khinsen is making here, but the term machine-readable is widely used and is not equivalent to any file stored on a computer. In my opinion, the term should be retained in the manuscript but we should improve on the definition/meaning.

emhart commented 8 years ago

I think @PBarmby has sufficiently addressed this issue, closing.