FAIR-Data-EG / consultation

A call for contributions to the report of the FAIR Data Expert Group
Other
16 stars 3 forks source link

Making FAIR data real - The community experience #12

Open FrTr opened 7 years ago

FrTr commented 7 years ago

[//]: # "==Do not write above this line== Instructions for posting issues: (1) Review what is already there. Perhaps a comment to an existing issue would be more appropriate than opening a new one? (2) Write your post below using Markdown (as per https://guides.github.com/features/mastering-markdown/ ) or just plain text. (3) Don't worry about these introductory lines - you can leave or delete them, as they won't display anyway (you can check this via Preview). (4) Hit the 'Submit new issue' button. ==Write below this line==" Probably you might already know this all but maybe it is still somewhat helpful answering your questions on how to make FAIR data real from what I have learned from the our almost 800 questioned scientists in a brief overview.

To what extent are the FAIR principles alone sufficient to reduce fragmentation and increase interoperability? The principles have a great potential to influence the minds of stakeholders towards more efficient data sharing and reuse, but perhaps additional measures and more specifics are needed to guide implementation?

What are the necessary components of a FAIR data ecosystem in terms of technologies, standards, legal framework, skills etc?

What existing components can be built on, and are there promising examples of joined-up architectures and interoperability around research data such as those based on Digital Objects?

Do we need a layered approach to tackle the complexity of building a global data infrastructure ecosystem, and if so, what are the layers? Which global initiatives are working on relevant architectural frameworks to put FAIR into practice?

A large proportion of data-driven research has been shown to not be reproducible. Do we need to turn to automated processing guided by documented workflows, and if so how should this be organised?

What kind of roles and professions are required to put the FAIR principles into place?

ghost commented 7 years ago

Dear Frank, I am not familiar with your community research at KIT (?). Can you please share your published work here? Starting in August 2017 I will execute case studies in a few subjects here at TU Delft and would like to know more about your work on the topic.

Many thanks in advance! Jasmin

FrTr commented 7 years ago

Dear Jasmin,

in German we have a report and user stories online. There are also some slides in English, that summarize the results, but of course cannot go into details. Section 3.3.2 of this work covers a few aspects of our report in English. Unfortunately I didn't have time or funding to translate the report into English (it has 150 pages). The report was made for our German funding institution (Ministerium für Wissenschaft, Forschung und Kunst Baden-Württemberg (MWK)). If you have special questions, you can also directly contact me. My or my successors contact information is here on the right side.

CaroleGoble commented 6 years ago

BioITWorld FAIR Hackathon http://www.bio-itworldexpo.com/fair-data-hackathon/ also focused on FAIR approaches to Pharmaceutical data.

Incidentally the latest IMI call is about Fairification https://ec.europa.eu/research/participants/portal/desktop/en/opportunities/h2020/topics/imi2-2017-12-02.html,

 

CaroleGoble commented 6 years ago

Presentation Title: FAIRShake: Toolkit to Enable the FAIRness Assessment of Biomedical Digital Objects

Abstract: While it is clear that there will be a benefit in making biomedical digital objects more FAIR, the FAIR principles are abstract and high level. FAIRShake brings these principles into practice by encouraging digital object producers to make their products more FAIR. The FAIRShake toolkit is designed to enable the biomedical research community to assess the FAIRness of biomedical research digital objects. These include: repositories, databases, tools, journal and book publications, courses, scientific meetings and more. The FAIRShake toolkit uses the FAIR insignia to display the results FAIR assessments. The insignia symbolizes the FAIRness of a digital object according to 16 FAIR metrics. Each square on the insignia represents the average answer to a FAIR metric question. The FAIRShake Chrome extension inserts the insignia into web-sites that list biomedical digital objects. Users can see the insignia and also contribute evaluations by clicking on the insignia. It is also possible to embed the insignia without the need for a Chrome extension and initiate FAIR evaluation projects using the FAIRShake web-site directly. Currently, the FAIRShake web site enlists four projects: evaluation of the LINCS tools and datasets, evaluation of the MOD repositories, evaluations of over 5,000 bioinformatics tools and databases, and evaluations of the repositories listed on DataMed. The project is at an early prototyping phase so it is not ready for broad use.

peter-wittenburg commented 6 years ago

Frank raises a couple of different aspects - some have been commented by others. Let me try to find my way. I should add here that I have some knowledge of what is being done at KIT and we had some collaborations - also on questions Frank is raising.

  1. Is FAIR sufficient to prevent fragmentation? You refer to publishing also negativ results which is indeed a discussion in many if not all disciplines. But the FAIR principles do not go about this question: they only state that if you produce data make them FAIR. This includes negative results. It is more of a social problem that many researchers hesitate to publish data which did not lead to clear results. So FAIR does not address the social aspects or? Don't know whether I can share your view about "reproducibility". You are asking for the ideal solution correct? Would be wonderful. But currently even your second option does not work which is a desaster for science.
  2. Components of a FAIR data ecosystem? Yes skills are needed desperately and it is a pity that Edison will not be continued if it is correct what I heard - please correct me if I am wrong. Your second point is interesting, can't see the direction at this moment.
  3. Architecture Examples Yes indeed together with KIT and others we built the EUDAT federation amongst others and KIT as others are involved in several infrastructures. Not quite sure Frank what you mean with "most synergy". I just had an interaction with one of the large German companies who start building federation environments for data (they call it differently). When I asked him about RDA and standardisation he argued that standardisation would ruin his business modell, since "heterogeneity" means money in economic terms. Standardisation would mean reduction of costs. RDA (as all the other standardisation initiatives) wass set up with the intention to harmonise and thus reduce costs. Whether looking for synergies will automatically lead to compatibility I dare to doubt. Compatibility in "economic" terms at this moment means for companies to try to convince their clients and other that there solution is the best, changes would cost time and money. So they remain with a silo. Your second remark is absolutely right. Some time ago we looked around in all the ESFRI research infrastructures and it is the network of repositories (some call it centers with some additional tasks) is crucial for almost all of them.
  4. Layered Approach? You are absolutely right that when it comes to an interaction with the users - there should be one interface and a clear assignment of roles. But this is not meant here. We were speaking about systems design. How to get a complex system done so that it is compatible. I just had another chat with one of the two founders of Internet. They just throw TCP/IP on the floor and showed that it works for message exchange and routing without any further going claims. It was the evolution that led us from FTP to the Web. If we now look at some initiatives such as IIC or IDS they come up with coherent and comprehensive arhitectures to guide infrastructure development - so a slightly different approach as it seems. How do we get ahead and overcome all this fragmentation?
  5. Automatic Processing We made a large survey in Europe 3 years ago intercting with about 120 departments etc. and we found out that data scientists speak about 75% of their time losing for data finding, integration etc. A colleague from MIT (M. Brodie) reported about a study where data scientists reported 80% of their time being vasted. So it is obvious that we cannot go on like this and everyone knows. Why is it so difficult to change? We we re given two major answers: 1) In cancer research for example there are so many variants, parameter, etc. choices that it is difficult to create a workflow framework that would really help. Much work is still ad hoc scripting etc. 2) There is a lack of people who can really develop these kind of flexible workflow systems - it is almost an art :) So yes it seems that we are on the same page here, but developign flexible workflows is tough.
  6. Professions Let me be brief here since it is not my favorite area and Edison has worked out a nice classification of job profiles - yet we are far away from getting them into practice. Your idea of a FAIR manager is a bit what the colleagues behind Go-FAIR are dreaming of. At least German ministry decided to fund a FAIR node - whatever it will do.

So thanks for your great input which we need to consider in the report.

sjDCC commented 6 years ago

Thanks for the FAIRShake reference @CaroleGoble I've found a link to a short video on youtube but if you have other literature references we should follow up that would be great.

peter-wittenburg commented 6 years ago

I just looked at the FAIRshake video and it is indeed pritty cool. if I got it right, it's finally the crowds view on the fairness of DOs. So this makes it complementary to approaches such as DSA/WDS where people do self-assessment based on rule sets. Thanks Carole Peter