FAIR-CA-indicators / fair-ca-indicators-backend

Apache License 2.0
0 stars 0 forks source link

Questions/issues regarding CSH scores #15

Open vera opened 9 months ago

vera commented 9 months ago

F

A

I

Some thoughts regarding "CSH-RDA-I3-01M", "CSH-RDA-I3-02M", "CSH-RDA-I3-03M" and "CSH-RDA-I3-04M" ((qualified) references to (meta)data):

I think the current check is too strict. You are checking whether "ids" contains entries that are "Datasets" (or not).

The MDS can contain references in the following fields:

  1. contributors (with mandatory type, e.g. "Contact", "Creator/Author") -> = "other referenced metadata"

  2. ids -> always "qualified" because "relationType" (e.g. "A continues B") is mandatory -> unsure how to differentiate "data" and "metadata", maybe using "typeGeneral" as you are already doing, but I don't think everything but "Dataset" is metadata. E.g. is a "Jounal article" metadata?

  3. idsNfdi4health -> "qualified" if "relationType" is given -> also unsure how to differentiate "data" and "metadata" here. in general, NFDI4Health resources are metadata, but they may have data attached. If you use the API to request the NFDI4Health resource, the "link" field will tell you whether data is attached. Could you use this?

R

vera commented 9 months ago

Btw, since the Study Hub is already using MDS 3.3, you could also update to MDS 3.3.

AtinkutZeleke commented 9 months ago

F

* [ ]  Why is "CSH-RDA-F3-01M: Metadata includes the identifier for the data" always failed? Shouldn't it be successful if the metadata contains a `resource_identifier`?

That is an important : This indicator deals with the inclusion of the reference (i.e. the identifier) of the the resource in the metadata so that the the resource can be accessed. Can we use `resource_identifier' for both metadata and resource identifier? Which one is for the resource and which one is for the metadata of that specific resource?

A

* [ ]  "CSH-RDA-A1-01M: Metadata contains information to enable the user to get access to the data" checks the wrong path `["resource","study_design","study_data_sharing_plan","study_data_sharing_plan_description"]` instead of `["resource","study_design","study_data_sharing_plan","study_data_sharing_plan_generally"]`

What about the following additional lists? Can you also confirm that?

Can we use the following as well?

* [ ]  same for "CSH-RDA-I3-02M"

Some thoughts regarding "CSH-RDA-I3-01M", "CSH-RDA-I3-02M", "CSH-RDA-I3-03M" and "CSH-RDA-I3-04M" ((qualified) references to (meta)data):

I think the current check is too strict. You are checking whether "ids" contains entries that are "Datasets" (or not).

The MDS can contain references in the following fields:

1. `contributors` (with mandatory type, e.g. "Contact", "Creator/Author")
   -> = "other referenced metadata"

2. `ids`
   -> always "qualified" because "relationType" (e.g. "A continues B") is mandatory
   -> unsure how to differentiate "data" and "metadata", maybe using "typeGeneral" as you are already doing, but I don't think everything but "Dataset" is metadata. E.g. is a "Jounal article" metadata?

3. `idsNfdi4health`
   -> "qualified" if "relationType" is given
   -> also unsure how to differentiate "data" and "metadata" here. in general, NFDI4Health resources are metadata, but they may have data attached. If you use the API to request the NFDI4Health resource, the "link" field will tell you whether data is attached. Could you use this?

You are right! That is the most We really need an agreement or a contextual understanding

R

* [ ]  "CSH-RDA-R1.1-01M: Metadata includes information about the licence under which the data can be reused" checks the wrong path `["resource", "nonStudyDetails", "useRights"]` instead of `["resource", "non_study_details", "resource_use_rights"]` (again assuming we are based on MDS 3.0, this check is using a weird mixture of MDS 3.0 and 3.3 paths)

* [ ]  "CSH-RDA-R1.1-02M: Metadata refers to a standard reuse licence" checks the wrong path `["resource", "nonStudyDetails", "useRights"]` instead of `["resource", "non_study_details", "resource_use_rights", "resource_use_rights_label"]` (again assuming we are based on MDS 3.0)

We have a block of items that can be used for licence related indicators (CSH-RDA-R1.1-01M, CSH-RDA-R1.1-02M, and CSH-RDA-R1.1-03M, some of them are not mandatory, we can catagorize them where they can apply. what do you think?

* [ ]  "RDA-R1.2-01M: Metadata includes provenance information according to community-specific standards" and "RDA-R1.2-02M: Metadata includes provenance information according to a cross-community language": currently this is always failed, could this be marked as success if the "provenance" block in the MDS is filled?

If we assume that the one or more of the followings provenance related items are according to the NFDI4Health community-specific standards and a cross-community language then we can. what do you think the practice so far?

AtinkutZeleke commented 9 months ago

I will come with more specific questions

vera commented 9 months ago

I am skipping questions we already discussed in the call today.

That is an important : This indicator deals with the inclusion of the reference (i.e. the identifier) of the the resource in the metadata so that the the resource can be accessed. Can we use `resource_identifier' for both metadata and resource identifier? Which one is for the resource and which one is for the metadata of that specific resource?

Not sure. Do all metadata entries in the Study Hub have data attached? E.g., does a "Study" metadata entry have associated data?

As I mentioned during the call, the Study Hub allows attaching data to metadata entries (if type != "Study", "Substudy", "Registry", "Secondary data source"):

image

This is outside of the MDS. The MDS describes the metadata only.

If data is attached, it will be returned by the API like this

{
  "link": {
    "external": false,
    "url": "/api/resource/622/data"
  },
  "resource": {...},
  "versions": [...]
}

(https://csh.nfdi4health.de/api/resource/113)

Can we use the following as well? Resource.idsNfdi4health.relationType Resource.ids.relationType

What do you want to use it for?

vera commented 9 months ago

@AtinkutZeleke Regarding the provenance metric: The current provenance block contains mostly timestamps and usernames. In the deliverable, you listed four fields required to fulfill the provenance metric:

I am unsure if this mathces the description below. What do you think?

For others to reuse your data, they should know where the data came from (i.e., clear story of origin/history, see R1), who to cite and/or how you wish to be acknowledged. Include a description of the workflow that led to your data: Who generated or collected it? How has it been processed? Has it been published before? Does it contain data from someone else that you may have transformed or completed? Ideally, this workflow is described in a machine-readable format.

(https://www.go-fair.org/fair-principles/r1-2-metadata-associated-detailed-provenance/)