How can/should one introduce Context into FAIR data?

hrzepa commented 7 years ago

I have heard criticism of FAIR as representing only four of the five essential attributes of data. The missing component is “context”. Without the back story associated with the data, it is impoverished.

Arguably, the FAIR metadata can provide link(s) to such back stories, but is this sufficient and should over mechanisms such as perhaps EventData be promoted as well?

[//]: # "==Do not write above this line== Instructions for posting issues: (1) Review what is already there. Perhaps a comment to an existing issue would be more appropriate than opening a new one? (2) Write your post below using Markdown (as per https://guides.github.com/features/mastering-markdown/ ) or just plain text. (3) Don't worry about these introductory lines - you can leave or delete them, as they won't display anyway (you can check this via Preview). (4) Hit the 'Submit new issue' button. ==Write below this line=="

dr-shorthair commented 7 years ago

Yes. Context could also be generalized as 'linked' or 'connected'. This is a weakness or gap in the current FAIR gamut.

micheldumontier commented 7 years ago

The FAIR principles indicate that reuse is enabled with detailed provenance (R1.2), and this emcompasses context.

CaroleGoble commented 7 years ago

Is context just provenance? IMHO, no.

As soon as one is working with the datasets and OTHER assets arising from a range of studies and experiments you are not just talking about provenance, nor are you just talking about data as one data set. Or even data at all. In the FAIRDOM Systems Biology asset management platform we link together data, models, SOPs, workflows, samples, publications etc all around the ISA model. the entire compound "Research Object is FAIR as well as the individual components within. See http://www.fair-dom.org, and

Wolstencroft K, Krebs O, Snoep JL, Stanford NJ, Bacall F, Golebiewski M, Kuzyakiv R, Nguyen Q, Owen S, Soiland-Reyes S, Straszewski J, van Niekerk DD, Williams AR, Malmström L, Rinn B, Müller W, Goble C FAIRDOMHub: a repository and collaboration environment for sharing systems biology research. Nucleic Acids Res, 45(D1): D404-D407. DOI: 10.1093/nar/gkw1032 (2016)

The Research Object (http://www.researchobject.org) approach is all about metadata manifests that retain context and relate components that are potentially scattered in external resources as well as contained in containers like docker or even zip files. By having FAIR Research Objects, rather than just "data" we get context

Belhajjame K, Zhao J, Garijo D, Gamble M, Hettne K, Palma R, Mina E, Corcho O, Gómez-Pérez JM, Bechhofer S, Klyne G, Goble C Using a suite of ontologies for preserving workflow-centric research objects, J. Web Sem. 32: 16-42, doi:10.1016/j.websem.2015.01.003. (2015) and Chard K, D' Arcy M, Heavner B, Foster I, Kesselman C, Madduri R, Rodriguez A, Soiland-Reyes S, Goble C, Clark K, Deutsch EW, Dinov I, Price N, Toga A I'll Take That to Go: Big Data Bags and Minimal Identifiers for Exchange of Large, Complex Datasets IEEE Intl Conf on Big Data doi:10.1109/BigData.2016.7840618 (2016)

CaroleGoble commented 7 years ago

Another comment - in Systems approaches in particular we are crossing the boundaries of different types of data - as is the case in, say, polyomic studies. Thus we very much need to retain the context of how data are related to each other in a study. In project data management workflows too often this linkage is broken when the sub-datasets are disbanded into type specific, siloed, public deposition archives. FAIRDOM (above) tackles this from the start for poly-asset projects through a FAIR metadata layer. BioStudies, kind of does this too. The DTL FAIRification platform (DataFAIRPoints and Fairifier) attempts to recover this retrospectively.

micheldumontier commented 7 years ago

The research object approach is perfectly fine way to bundle things together and provide the metadata that you need to understand what those objects are, and, as you say, the context for those objects. While we might disagree that the provenance of a digital object does not fully encompass the context from which it was produced, we should agree that context is covered by R1. meta(data) have a plurality of accurate and relevant attributes.

dr-shorthair commented 7 years ago

we should agree that context is covered by R1. meta(data) have a plurality of accurate and relevant attributes.

I agree that you could shoehorn context here, but it would be helpful to have it made more explicit.

I'm looking for a bridge from FAIR to the 5th star of the W3C's Linked Open Data principles here - e.g. http://5stardata.info/en/ If your data is linked into a bigger 'graph' then it is more useful. This requires cross-references and (hyper-)links, not just 'a plurality of attributes'.

micheldumontier commented 7 years ago

I3 is responsible for the connectivity part of the data/metadata. See https://www.dtls.nl/fair-data/i3-metadata-include-qualified-references-metadata/

m.

On Mon, Jul 31, 2017 at 6:07 PM, Simon Cox notifications@github.com wrote:

we should agree that context is covered by R1. meta(data) have a plurality of accurate and relevant attributes.

I would agree that you could shoehorn context here, but it would be helful to have it made more explicit.

I'm looking for a bridge from FAIR to the 5th star of the W3C's Linked Open Data principles here - e.g. http://5stardata.info/en/ If your data is linked into a bigger 'graph' then it is more useful. This requires cross-references and (hyper-)links, not just 'a plurality of attributes'.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/FAIR-Data-EG/consultation/issues/16#issuecomment-319116632, or mute the thread https://github.com/notifications/unsubscribe-auth/AA8qPO4cYlEWhVTxNfPoFSR6e18AKDcOks5sTfu8gaJpZM4OMFLe .

-- Michel Dumontier Distinguished Professor of Data Science Maastricht University http://dumontierlab.com

dr-shorthair commented 7 years ago

Yes. I had temporarily overlooked that (though had already slotted it into our rating framework - https://confluence.csiro.au/display/OZNOME/Data+ratings ).

Unfortunately I find the groupings in FAIR to be less than ideal, so in some cases the chief concern is smeared over more than one FAIR principle - for example, 'findable' overlaps with R1, F2, F3, and 'useable' with I2 and R1.3.

Maybe its because our focus is on data, rather than metadata?

FAIR-Data-EG / consultation

How can/should one introduce Context into FAIR data? #16