FAIR-Data-EG / Action-Plan

Interim recommendations and actions from the FAIR Data Expert Group
Other
5 stars 1 forks source link

Rec. 3: A model for FAIR Data Objects #3

Open sjDCC opened 6 years ago

sjDCC commented 6 years ago

Implementing FAIR requires a model for FAIR Data Objects which by definition have a PID linked to different types of essential metadata, including provenance and licencing. The use of community standards and sharing of code is also fundamental for interoperability and reuse.

image

asconrad commented 6 years ago

Regarding standards: Applying relevant standards whenever available is a good thing and should be emphasized. However, not all disciplines are equally well covered. Also, standards may be in place for specific infrastructural purposes and not easily applicable e.g. for a repository deposit. And - research tends to invent and test new things, so in a few cases researchers still need to invent their own formats. So these recommendations are to some extent the data services' view - if possible, I would like to allow some flexibility for researchers to approach FAIR data, even if standards and standard file formats are not readily available. Eg. by including format specifications/documentation in the dataset (XSD's or whatever). The crucial point is to make data interpretable by some future user.

asconrad commented 6 years ago

Does a "model" of a FAIR data object imply a formal specification?

holubp commented 6 years ago

BBMRI-ERIC Position: Interoperable provenance needs to be highlighted much more with the focus on ensuring reproducibility of the research results. Sharing irreproducible and unreliable data may result in further escalating the existing reproducibility problems in various science domains (medical research, life sciences are known for this as witnessed by a number of publications) and in incorrect interpretation of the data. Provenance of the data needs to cover the whole source chain - for example in life sciences dealing with biological material, the provenance information needs to trace also the material and data generation steps.

ghost commented 6 years ago

4TU.Centre for Research Data position: Regarding Systems must be put in place for automatic checks on the existence and accessibility of PIDs, metadata, a licence or waiver, and code, and to test the validity of the links between them. What is meant by "validity of the links between them"? Currently a PID (in our case DOI) would land on a metadata-body that includes primary and secondary metadata, and license information. Clarification would be appreciated.

katerbow commented 6 years ago

DFG position: The objective of this recommendation is not self-evident. It is understood as building-up a set of best practice FAIR-procedures and objects in order to demonstrate the benefits of the FAIR-principles. That would clearly be supported, however, in order to achieve appropriate effects it should be accompanied by respective reasonable outreach activities.

The model depicted in Fig. 1 seems to illustrate the FAIR-data principles well, but indicates on the other hand a locked-down character of the actual data in the centre of the onion-type shell.

ScienceEurope commented 6 years ago

Science Europe agrees with this recommendation. One of Science Europe’s core requirements for DMPs which will be published towards the end of 2018 requires the use of appropriate PIDs.

ferag commented 6 years ago

The use of metadata is very important to address the four FAIR principles, but mechanisms to facilitate the use of metadata are needed to be implemented aiming at automatizing as much as possible the metadata attachment.

Regarding the third point, in chapter 6.8 of my Ph.Dd thesis I suggest an approach to measure the data "FAIRness" (http://hdl.handle.net/10261/157765)

pkdoorn commented 6 years ago

As the FAIR principles do not state anything about the level of granularity, this is indeed an important point. The recommendation speaks about “FAIR data objects” as the core, but even here there is a hierarchy of levels (which is also partly dependent on the type of data). Data bits can be organized into records, grouped into files, which are then grouped into data sets belonging to a certain study or research project, which can be part of broader collections, that are then stored in repositories… This takes us back to the specification of the FAIR principles: already F1, stating “F1. (meta)data are assigned a globally unique and persistent identifier” raises the question at which level of granularity such PIDs need to be assigned.

mromanie commented 6 years ago

ESO position Are DOIs believed not to be "appropriate PIDs"? If so, what defines an appropriate PID?

gtoneill commented 6 years ago

The objective of this recommendation is unclear: is it to collect good practice examples of FAIR Data? The recommendation addresses various (random) issues such as PIDs (a subset of FAIR), metadata (all components of FAIR), licenses, code, and standards as well as awareness raising, training, support, and checking. Any good practice example should be clearly related to the (full) FAIR principles.

MSoareses commented 6 years ago

On item 2 of this recommendation I believe that publishers should also be added. At Elsevier journal publishers in specific subject areas active in research data management present workshops where they emphasize research data practices and tools used in those fields. An example of this is the publishers of Neuron having implemented and promoted the use of Resource Identifiers for this title (unique numbers for reagents to promote reproducibility by @SciCrunch /Resource Identifier Initiative of @FORCE11)

On item 3 I would say that also publishers must be involved to enable effective linking be that through the usage of discipline specific PIDs or ensuring linking between deposited data and peer-reviewed articles. At Elsevier in our editorials systems we have or are putting in place features to effectively establish these links between our articles and data deposited in any repository flagged by the author during submission in a data availability statement. These links between data DOI and article DOI can then be retrieved/checked via @Scholix which “aims to enable an open information ecosystem to understand systematically what data underpins literature and what literature references data”.

npch commented 6 years ago

SSI position:

The FAIR Data Objects include the concept of code - this code should be well described and cited the FORCE11 Software Citation Principles should be adhered to - and the relationship between the data and code should be described using rich formats e.g. as a Research Object - https://doi.org/10.1016/j.future.2011.08.004 and https://doi.org/10.1016/j.websem.2015.01.003

The testing of links between PIDs necessitates the use of machine readable / actionable DMPs, license files (using SPDX) and other forms of metadata.

mark-cox commented 6 years ago

euroCRIS position:

This type of model is already well developed in area of research information management. CRIS/RIM (research information) systems have at their core interlinked PIDs and metadata regarding the entire research landscape. This is based on the long-standing and mature open-standard CERIF data model, which can already encompass the concept of research data. We support the recommendation of educational programmes to raise understanding of relevant standards, and indeed euroCRIS already offers regular tutorials on the use of CERIF to all interested parties.

aidanbudd commented 6 years ago

ELIXIR-UK position:

Code - software - is buried in 1 sentence. Code is ESSENTIAL to process and analyse data. Sustaining and managing FAIR software is a HUGE deal and barely mentioned. Data is pretty useless without software to get it, manage it and use it.

etothczifra commented 6 years ago

DARIAH-ERIC position: As noted above, finding a good balance between the conflicting aims of making objects discoverable through standardization vs. capturing the complexity of the provenance of the data object is of primary importance in ensuring reusability of research outputs. In Humanities, data objects typically come with a long history as they may pass through many hands and places in the course of being created and collected, each of which has an impact on how they might be reused or interpreted. A risk of such complexity being hidden from researchers or the lack of comprehensive, transparent, and easily understandable conditions of access to the documents is one of the greatest threat of achieving FAIR data. Also, it might be useful to complement the FAIR data model depicted in Fig. 1 with a more networked illustration accounting for granularity and the embeddedness of data objects into bigger units. Such illustration could highlight the importance of appropriate levelling of data description e.g. licensing both on the level of the dataset but also on the level of data objects.