ESIPFed / science-on-schema.org

science-on-schema.org - providing guidance for publishing schema.org as JSON-LD for the sciences
Apache License 2.0
109 stars 31 forks source link

Generate Tutorial for ESIP Summer Meeting 2022 #226

Open ashepherd opened 2 years ago

njarboe commented 1 year ago

@ashepherd @fils We have updated the MagIC data headers to comply with the 1.3 guidance doc. This example has many of the elements described in the guidance doc: common properties, keywords, identifier, distributions, temporal coverage, spatial coverage, publisher/provider, funding, and license. Jarboe2008SchemaHeader.txt The website for this dataset containing the science-on-schema.org/JSAON-LD header is found on: https://www2.earthref.org/MagIC/19596

As @mbjones suggested, I thought this file could be used as one of the example files for the tutorial as it has many of the header elements described in the guidance doc.

smrgeoinfo commented 1 year ago

See the annotated version of the magic example. There are various things I'd suggest:

in @context -- need '/' at the end of the gsqtime stem

sameAs: is identifier for a publication, not for the data that the publication is based on. I'd suggest that schema:subjectOf or schema:citation are a better fit, but would recommend against schema:citation because it gets used for a recommended citation string in spite of the schema.org guidance. The citation to the related publication is in schema:citation, so that'll do for now.

provider organization, identifier is in schema:sameAs; why not use schema:identifier? that would be clearer

contributor should be a schema:Person

name should be a title for the dataset, not a citation for a publication based on the data description should describe the dataset, not a citation for publication based on the data

funding should have the FundingAgency in a funder property

putting the spatialCoverage in an array of points is technically correct, but I wonder how typical harvest clients are going to handle that? Perhaps including a box (maybe two boxes?-- looks like points are clustered in 2 areas) would be friendly for aggregators?

unitText 'custom' is not very useful for propertyValue 'Age'. Shouldn't is be Ma?

smrgeoinfo commented 1 year ago

The https://github.com/earthcube/GeoCODES-Metadata/tree/doco-mergeECRR-GeoCodesDataset/metadata/Dataset repo directory has a collection of metadata examples that have been harvested from repositories to GeoCODES. There are the original harvest examples (...1.json), and updated versions (...1-2022-07-07.JSON) that validate with a JSON schema build based on the various examples and set up to harmonize schema.org doco for resources in the EarthCube Resource Registry with the dataset metadata the CDF repos have submitted for GeoCODES. This is recent work (still in a branch in our GeoCODES metadata repo) I haven't updated the SOSO group about yet, but would love to provide an update during the ESIP session. Unfortunately I won't be in Pittsburg in person...

njarboe commented 1 year ago

@context - fixed

sameAs - The guidance doc states: sameAs - Other URLs that can be used to access the dataset page. A link to a page that provides more information about the same dataset, usually in a different repository.

Based on your description of sameAs it seems the docs need to be changed.

The MagIC URL element ("https://earthref.org/MagIC/doi/10.1029/2008GC002067") is another URL pointing to the dataset. It is based on the doi of the paper that describes the dataset and I thought it could be useful for people to include that, but it seems this usage may be incorrect. I will change upon further clarification if that is what we should do.

provider The guidance docs example used sameAs so I followed that guidance. I agree that "identifier" seems better so I have changed it.

contributor - fix in progress

name - fix in progress

description - We put the abstract of the paper here when available and a link to the paper when not available. We will discuss other options for this field.

funding - Our data model does not currently support more than two pieces of information related to funding. We can look into adding more in the future.

spatialCoverage - We find plotting just points for each site location in a dataset is most useful for our users. We will think about boxes.

unitText - Looking into fix.

Your comment - "gstime:geologicTimeUnitAbbreviation": { "@type": "xsd:string", "value": "BP" }
// seems like this is redunant, unnecessary

@smrgeoinfo I believe you asked for these abbreviations to be added and we did the work to do that. Do you remember what that was the case?

dr-shorthair commented 1 year ago

The documentation on schema:sameAs does not draw attention to what I think is the key point, from owl:sameAs.

If two resources are related by sameAs then the properties of each apply to the other - the nodes can be merged in the graph. So if there are any properties associated with one node that really don't make sense associated with the other node, then don't say they are sameAs.

smrgeoinfo commented 1 year ago

Nick-- I misunderstood https://earthref.org/MagIC/doi/10.1029/2008GC002067 and resolved the doi part, but I see that it resolves to the dataset landing page, so I think you are correct that 'sameAs' is appropriate.

smrgeoinfo commented 1 year ago

I don't recall the discussion around the time unit abbreviation; having the gstime:geologicTimeUnitAbbreviation property doesn't cause any problem, but it does seem that the UOM should be part of the TRS definition.