cznethub / dsp

CZNet Hub Data Submission Portal
BSD 3-Clause "New" or "Revised" License
4 stars 1 forks source link

Correctly handle EarthChem Library spatial coverage information #102

Closed horsburgh closed 1 year ago

horsburgh commented 1 year ago

Describe the feature you'd like and what it will do A clear and concise description of what you want to happen.

Metadata API responses from the EarthChem library do not currently include spatial coverage information. That information needs to be included in the API response and then added to the catalog database.

Why is this feature important? A short description of the importance of this feature and what it will help you achieve.

Datasets in EarthChem are usually associated with spatial coverage information including the locations of samples. That information can be shown on the spatial coverage map snippet on the search results page and can be used to enable geospatial search/filtering of results based on location.

Is your feature request related to a problem? Please describe. A clear and concise description of what the problem is. Ex. I'm always frustrated when [...].

Yes - ECL API requests need to return spatial coverage information to enable this feature.

Additional context Add any other context or screenshots about the feature request here.

NA

sblack-usu commented 1 year ago

I have sent an email to Peng and received a response that he is looking into the issue. I also asked for Peng's github user id so we can move the conversation to here.

sblack-usu commented 1 year ago

Peng and I discussed this last week and intended to discuss it in the cznet call this morning but we ran out of time.

The spatialCoverage in the json-ld is different than the spatialcoverage (enum) field in the API. Since the submission portal isn't actually collecting spatial coverage information, we could omit it from discovery. From what I understand, the spatial coverage in the json-ld is derived from the datasets during the publishing process.

I could scrape the json-ld from the public page (#11) instead of building it from the api metadata. I can make the implementation easy enough to swap out where we build the json-ld from if we decide differently later on.

horsburgh commented 1 year ago

@sblack-usu - I see that this is a little tricky. I think we do want to be able to discover EarthChem resources using their spatial coverage information. We've still got a map interface for the catalog/discover system planned, so if none of the EarthChem resources have spatial coverage information they just won't show up.

I think we probably need to scrape at least the spatial coverage information for EarthChem resources.

We can discuss more if needed because this does open the door to inconsistency across repositories. We could say that we are going to scrape Schema.org JSON-LD from the landing pages for all resources. But, then, what happens if someone registers something from a repository that doesn't provide Schema.org JSON-LD in their landing pages? I think HydroShare's metadata mapping to Schema.org is pretty good and pretty complete, although I know it's a little lossy (but perhaps not in any ways that matter for data discovery). I haven't studied EarthChem's implementation of Schema.org. Maybe to standardize on Schema.org JSON-LD we use whatever JSON-LD HydroShare, EarthChem, and Zenodo provide in the public landing page and then just map our 3rd party repository metadata schema to Schema.org and deposit that in the catalog for any resources registered from an external repository. Then, everything in the catalog is consistent with Schema.org.