Is the assiduous collection of metadata and its aggregation via a DOI agency an essential operation for FAIR Data?

Comment/question.

I believe the FAIRness of data must be associated with the quality of the metadata associated with it.

My question is whether there is agreement that assiduous collection of metadata and its aggregation via a DOI agency is an essential operation?

My comment is that FAIR metadata assiduously collected against a specified schema does not seem common yet. Thus http://doi.org/b88d is an analysis of some of the the most scientifically important datasets collected in recent years, together with the metadata associated (or not) with them. How are we to interpret that such a well-resourced research project is not assiduous in its metadata dissemination?

[//]: # "==Do not write above this line== Instructions for posting issues: (1) Review what is already there. Perhaps a comment to an existing issue would be more appropriate than opening a new one? (2) Write your post below using Markdown (as per https://guides.github.com/features/mastering-markdown/ ) or just plain text. (3) Don't worry about these introductory lines - you can leave or delete them, as they won't display anyway (you can check this via Preview). (4) Hit the 'Submit new issue' button. ==Write below this line=="

Thanks for the comment which I found not so easy to comment on, since the direction is not fully clear to me. Let me see whether I understand at least a bit. FAIR as others beforehand speak about "rich metadata" and this is a dream of course for data reuse. But the term "rich metadata" is not defined and I would claim it can't be defined, since it depends on the purpose of using metadata. In my former community some people relied on Dublin Core (simple metadata useful for the occasional visitor), while others wanted to be more detailed so that they can use metadata for their scientific question (forming a collection for observations with youngsters of 4 years against a collection for youngsters with 6 years) for example, i.e. they needed age and sex of subjects. Others needed to destinguish between right handers against left handers, etc etc. When we started with workflows several years ago we saw that we eeven needed much more detailed metadata to be able to chose suitable operators (software) following previous ones etc etc. Harvesting metadata in general meant for projects I was involved in to harvest metadata of different "richness" or quality, metadata adhering to different schemas and most difficult different semantics. So harvesting against one schema as is stated in the comment only occured when Dublin Core was harvested via OAI-PMH (will not talk about the effects of the weak semantic definitions of DC terms). what some do to start with is to put all harvested terms into a database and just do test kind of search on these terms - not very satisfying. The next step was then to carry out a semantic mapping which was in general done manually since tests with automatic semantic mapping did not really work for several reasons (flat structures etc.). This then led to much more useful results. Not sure whether I commented on the point that was made by our colleague. The methods I just described (massive harvesting, different schemass, different semantics, mapping) are don now very often and are mostly the basis behind metadata portals. In many scientific areas of course this is not the practice. A study from DataONE a few years ago showed that most people were still using no standard, something homemade and not documented etc. Practices have improved but it's a long way to go to come to "rich" metadata in particular since leading scientists hate to create metadata - the question is then: who does create proper metadata? Perhaps we need data librarians that just increase metadata quality.

FAIR-Data-EG / consultation

Is the assiduous collection of metadata and its aggregation via a DOI agency an essential operation for FAIR Data? #14