cessda / cessda.cdc.versions

Issue track and wiki for the CESSDA Data Catalogue
https://datacatalogue.cessda.eu/
Apache License 2.0
0 stars 0 forks source link

FUJI FAIR Metrics that fail and need reviewing #447

Open cessda-bitbucket-importer opened 2 years ago

cessda-bitbucket-importer commented 2 years ago

Original report on BitBucket by Kostas Papagiannopoulos (GitHub: kpapag).


After resolving the identifier issue, our FUJI results of studies that provide DOI are now about 45% FAIR compatible.

Posting here, for further review, the metrics that still fail and examine if something can be done from our side. Used this study as a test case, from localhost until identifier fix makes production:
https://datacatalogue.cessda.eu/detail?lang=en&q=fcdf8ad65e3650b4e3e8e5c379cdaaa67ee93b90addeb20616e9188c30e78db9

    "metric_name": "Metadata includes descriptive core elements (creator, title, data identifier, publisher, publication date, summary and keywords) to support data findability.",
"metric_test_name": "Metadata is accessible via typed links",
"metric_test_name": "Metadata is accessible via signposting links",
"test_debug": [
                "INFO: PID schemes-based assessment supported by the assessment service - ['ark', 'arxiv', 'bioproject', 'biosample', 'doi', 'ensembl', 'genome', 'gnd', 'handle', 'lsid', 'pmid', 'pmcid', 'purl', 'refseq', 'sra', 'uniprot', 'urn']",
                "INFO: Retrieving page -: http://localhost:8088/detail?lang=en&q=SoDaNet__doi%3A10.17903%2FFK2%2FBVFEYX as text/html, application/xhtml+xml, application/xml;q=0.5, text/xml;q=0.5, application/rdf+xml;q=0.5",
                "WARNING: Landing page seems to be JavaScript generated, could not detect enough content",
                "INFO: Object identifier active (status code = 200)",
                "WARNING: Not a persistent identifier scheme -: url",
                "SUCCESS: Found object identifier in metadata during FsF-F2-01M, PID check was repeated",
                "INFO: PID schemes-based assessment supported by the assessment service - ['ark', 'arxiv', 'bioproject', 'biosample', 'doi', 'ensembl', 'genome', 'gnd', 'handle', 'lsid', 'pmid', 'pmcid', 'purl', 'refseq', 'sra', 'uniprot', 'urn']",
                "INFO: Retrieving page -: http://doi.org/10.17903/FK2/BVFEYX as text/html, application/xhtml+xml, application/xml;q=0.5, text/xml;q=0.5, application/rdf+xml;q=0.5",
                "WARNING: Landing page domain resolved from PID found in metadata does not match with input URL domain",
                "INFO: Object identifier active (status code = 200)",
                "SUCCESS: Persistence identifier scheme -: doi"
            ],

    "metric_name": "Metadata includes the identifier of the data it describes.",
"metric_test_name": "Metadata contains data content related information (file name, size, type)",
"metric_test_name": "Metadata contains a PID or URL which indicates the location of the downloadable data content",
"test_debug": [
                "WARNING: Data (content) identifier is missing."
            ],

    "metric_name": "Metadata contains access level and access conditions of the data.",
"metric_test_name": "Information about access restrictions or rights can be identified in metadata",
"metric_test_name": "Data access information is machine readable",
"metric_test_name": "Data access information is indicated by (not machine readable) standard terms",
 "test_debug": [
                "WARNING: NO access information is available in metadata",
                "WARNING: Unable to determine the access level"
            ],

    "metric_name": "Metadata uses semantic resources",
"metric_test_name": "Namespaces of known semantic resources can be identified in metadata",
"test_debug": [
                "INFO: Number of vocabulary namespaces extracted from all RDF-based metadata -: 2",
                "INFO: Default vocabulary namespace(s) excluded -: ['http://schema.org']",
                "INFO: Check the remaining namespace(s) exists in LOD -: []",
                "WARNING: NO vocabulary namespace match is found",
                "WARNING: Vocabulary namespace (s) or URIs specified but no match is found in LOD reference list (examples) -: ['http://datacite.org/schema']"
            ],

    "metric_name": "Metadata includes links between the data and its related entities.",
"metric_test_name": "Related resources are explicitly mentioned in metadata",
"metric_test_name": "Related resources are indicated by machine readable links or identifiers",
"test_debug": [
                "INFO: No related resource(s) found in Schema.org metadata",
                "INFO: No related resource(s) found in Schema.org metadata",
                "INFO: No related resource(s) found in Datacite metadata",
                "INFO: Total number of related resources extracted -: 0"
            ],

    "metric_name": "Metadata specifies the content of the data.",
"metric_test_name": "Information about data content (e.g. links) is given in metadata",
"metric_test_name": "File size and type information are specified in metadata",
"metric_test_name": "Data content matches file type and size specified in metadata",
"metric_test_name": "Data content matches measured variables or observation types specified in metadata",
"test_debug": [
                "INFO: Object landing page accessible status -: True",
                "SUCCESS: Resource type specified -: dataset",
                "WARNING: NO data object content available/accessible to perform file descriptors (type and size) tests",
                "SUCCESS: Found measured variables or observations (aka parameters) as content descriptor",
                "WARNING: Could not verify measured variables found in data object content, content parsing failed",
                "WARNING: Measured variables given in metadata do not match data object content"
            ],

    "metric_name": "Metadata includes license information under which data can be reused.",
"metric_test_name": "Licence information is given in an appropriate metadata element",
"metric_test_name": "Recognized licence is valid and registered at SPDX",
 "test_debug": [
                "WARNING: License information unavailable in metadata"
            ],

    "metric_name": "Metadata includes provenance information about data creation or generation.",
"metric_test_name": "Metadata contains provenance information using formal provenance ontologies (PROV-O)",
"test_debug": [
                "INFO: Check if provenance information is available in descriptive metadata",
                "INFO: Check if provenance information is available in metadata about related resources",
                "INFO: No provenance information found in metadata about related resources",
                "SUCCESS: Found data creation-related provenance information",
                "INFO: Check if provenance specific namespaces are listed in metadata",
                "WARNING: Formal provenance metadata is unavailable"
            ],

    "metric_name": "Metadata follows a standard recommended by the target research community of the data.",
"metric_test_name": "Community specific metadata standard is detected using namespaces or schemas found in provided metadata or metadata services outputs",
"metric_test_name": "Community specific metadata standard is listed in the re3data record of the responsible repository",
"test_debug": [
                "INFO: Retrieving API and Standards",
                "INFO: re3data/datacite client id -: gesis.sodanet",
                "INFO: Trying to retrieve metadata info from re3data/datacite services using client id -: gesis.sodanet",
                "WARNING: No DOI of client id is available from datacite api",
                "INFO: Inferring endpoint information through re3data/datacite services",
                "INFO: Metadata standards listed in re3data record -: []",
                "WARNING: NO valid OAI-PMH endpoint found",
                "INFO: Namespaces included in the metadata -: ['http://datacite.org/schema/', 'http://schema.org/']",
                "INFO: Found non-disciplinary standard (but RDA listed) found through namespaces -: DataCite Metadata Schema (http://datacite.org/schema/)",
                "INFO: The following standards found through namespaces are excluded as they are not listed in RDA metadata catalog -: ['http://schema.org/']",
                "WARNING: NO metadata standard(s) of the repository specified in re3data"
            ],

    "metric_name": "Data is available in a file format recommended by the target research community.",
"metric_test_name": "The format of a data file given in the metadata is listed in the long term file formats, open file formats or scientific file formats controlled list",
"metric_test_name": "The format of the data file is an open format",
"metric_test_name": "The format of the data file is a long term format",
"metric_test_name": "The format of the data file is a scientific format",
"test_debug": [
                "WARNING: Could not perform file format checks as data content identifier(s) unavailable/inaccesible"
            ],

    "metric_name": "Data is accessible through a standardized communication protocol.",
"metric_test_name": "Metadata includes a resolvable link to data based on standardized web communication protocols.",
"test_debug": [
                "INFO: NO content (data) identifier is given in metadata"
            ],

cessda-bitbucket-importer commented 1 year ago

Original comment by Kostas Papagiannopoulos (GitHub: kpapag).


we managed to score higher by resolving the “Namespaces of known semantic resources can be identified in metadata“. The link that needed to be updated is CESSDA Vocabulary Topic Classification", URI="https://vocabularies.cessda.eu/v2/vocabularies/TopicClassification/4.0?languageVersion=en-4.0 to the newest https://vocabularies.cessda.eu/vocabulary/TopicClassification?v=4.0

We have updated the vocabulary links first in the ElasticSearch record which did not change the score. However, when updating the links in OAI xml file in a test environment, the score had a small increase.

Example: https://datacatalogue.cessda.eu/detail?lang=en&q=ff5e71ee1ed3e426ea71ea048f8bc7af10c567de690085d94d50de6d59b1705b was initially scoring "FAIR": 43.75, now scores "FAIR": 47.92.

Need to contact Robert and see if FUJI can handle redirect mechanism, otherwise OAI files holding this link could be updated by the SP’s.