FAIRMetrics / Metrics

This repository contains the results of the FAIR Metrics Group
http://fairmetrics.org
MIT License
106 stars 24 forks source link

Call for review metric: Gen2_FM_F3.md #38

Open markwilkinson opened 5 years ago

markwilkinson commented 5 years ago

please review this new 2nd gen metric

markwilkinson commented 5 years ago

The test for schema:mainEntity is not valid. mainEntity points to a block of metadata that DOES NOT necessarily contain the identifier.

markwilkinson commented 5 years ago

A better test would be mainEntity -> identifier (or one of the subclasses: accountId confirmationNumber duns flightNumber globalLocationNumber gtin12 gtin13 gtin14 gtin8 isbn issn legislationIdentifier leiCode orderNumber productID serialNumber sku taxID)

DanBerrios commented 3 years ago

@markwilkinson Hi Mark. We register our dataset DOIs with DataCite and pass to DataCite various dataset metadata at time of registration (currently using schema.org predicates). We do not pass to DataCite any of the predicates this metric is searching for, and thus our records are failing this metric. Where did you get the list of valid and required predicates for the dataset type of objects? I am looking in the Nature Sci Data guidance in https://www.nature.com/articles/s41597-019-0031-8.pdf and don't see the the schema.org predicates tested by this metric in their example dataset metadata or explicitly mentioned in their recommendations @jlbales

markwilkinson commented 3 years ago

Hi Dan,

Anything that has a DOI should pass this test! There may be something else failing... can you send me an example of a DOI you find is failing this test?

That article makes good suggestions, and the test follows those suggestions (and more!). Unfortunately, there is no such thing as a 'list of valid predicates', since nobody has the authority to say what is 'valid'. As such, my list comes from a survey of what people are using "in the real world". I make no claim to validity... I only claim that, based on usage, an agent that was looking for data would usually be able to find it if it looked for a predicate on that list.

Please send me an example of what you are seeing, and I will try to troubleshoot the test.

Cheers!

DanBerrios commented 3 years ago

@markwilkinson Sure: see https://fairsharing.github.io/FAIR-Evaluator-FrontEnd/#!/evaluations/5118 If you look at the test for F3 output, the last part says:

FAILURE: Was unable to locate the data identifier in the metadata using any (common) property/predicate reserved for this purpose. Tested the following ["http://www.w3.org/ns/ldp#contains", "http://xmlns.com/foaf/0.1/primaryTopic", "http://purl.obolibrary.org/obo/IAO_0000136", "http://purl.obolibrary.org/obo/IAO:0000136", "https://www.w3.org/ns/ldp#contains", "https://xmlns.com/foaf/0.1/primaryTopic", "http://schema.org/mainEntity", "http://schema.org/codeRepository", "http://schema.org/distribution", "https://schema.org/mainEntity", "https://schema.org/codeRepository", "https://schema.org/distribution", "http://www.w3.org/ns/dcat#distribution", "https://www.w3.org/ns/dcat#distribution", "http://www.w3.org/ns/dcat#dataset", "https://www.w3.org/ns/dcat#dataset", "http://www.w3.org/ns/dcat#downloadURL", "https://www.w3.org/ns/dcat#downloadURL", "http://www.w3.org/ns/dcat#accessURL", "https://www.w3.org/ns/dcat#accessURL", "http://semanticscience.org/resource/SIO_000332", "http://semanticscience.org/resource/is-about", "https://semanticscience.org/resource/SIO_000332", "https://semanticscience.org/resource/is-about", "https://purl.obolibrary.org/obo/IAO_0000136"]

That is the list of predicates I was referring to as being checked. We lack embedded metadata on our page and it looks from this output like we need to use at least one of those predicates when we do embed the metadata for the DOI and the DOI itself on the page.

markwilkinson commented 3 years ago

Yes, I see. you're injecting data/metadata via script, and the DOI provider has no information at all.

Unfortunately, there's not much I can do to resolve this problem... I'm not inclined to train my harvester to run scripts, since it explores arbitrary pages and isn't in such a protected space as a browser.

Note that the predicates it is searching for (the list you copy/paste above) are the predicates that point at the data (your CEL.gz records on that page). The DOI, which should also appear somewhere in the page, would require a different predicate (likely schema:identifier or dc:identifier)

Sorry I can't help more!

DanBerrios commented 3 years ago

@markwilkinson Ugh, I should have explained before asking you to take a look. Yes, we have not yet embedded our metadata on the dataset landing page, but we are planning to do that very soon. DataCite, the DOI provider, DOES in fact have the metadata for the dataset associated with this DOI (you can see it here: https://api.datacite.org/dois/application/vnd.datacite.datacite+json/10.26030/cwan-7h58 ), but our choice of schema.org predicates that we currently give to DataCite doesn't include any from the list that the output from the test of this F3 metric says it is looking for (e.g., we don't have schema.org:mainEntity predicate). What I was asking was if the listing of the predicates in the output of the test is from set of published predicates required to pass F3-based tests.

markwilkinson commented 3 years ago

Interesting... it looks like several of the DataCite content types are not responding at the moment - if you request turtle or rdf/xml, it fails, but if you request json-ld it succeeds. that's why I thought it wasn't providing any metadata at all!

Yes, if you're using schema, then mainEntity is one of the few choices (there are other choices for e.g. code repositories, but not for data)

Cheers!

DanBerrios commented 3 years ago

@markwilkinson Where did you get the list of predicates this metric test is testing? ...can you provide the reference? I don't see schema.org:mainEntity in the Nature citation roadmap paper for any types including dataset types....