USEPA / EPA_Environmental_Dataset_Gateway

U.S. EPA’s Metadata Catalog
https://edg.epa.gov
3 stars 2 forks source link

Elements: Language, Spatial, Data Quality, Theme, isPartOf #78

Closed torrin47 closed 5 years ago

torrin47 commented 5 years ago

These are all optional elements that are pretty low on the priority list, but we should include for completeness.

We've used a short picklist for language (en-US, en-CA, es-MX, es-PR, ch, sm) but probably need a better option for extramural contributors. The old lookup project open data referenced is no longer online - is there anything new?

Data Quality is binary (true/false) and as such, pretty functionally useless - is anybody really going to list their data with "false" for data quality? I guess we should give them the option?

Theme is the same as ISO Keyword, but split out into a less flexible and ignored element. Maybe we can tie this to the selection in the ISO Keyword dropdown and make it transparent to the end user.

Spatial isn't really relevant in a non-geo editor, but we have offered users some pre-populated extents of CONUS or the EPA regions. Maybe again we should link this element to whatever keywords are chosen in the Place dropdown and also hide it from the end user (using the words and no extents).

I would be really surprised if anyone wants to take advantage of isPartOf - but we should leave it open as free text to include a unique identifier of a parent record if someone wants to reference one.

jzichichi commented 5 years ago

@torrin47 - just a note on isPartOf - there was a period of time where OLEM was doing metadata collection designations using isPartOf. I think that enthusiasm may have died down, but I know that a number of the records in the OLEM group still contain the isPartOf element from the Michael Alford metadata collection curation days, so keeping it in there as a free text seems like a good idea.

aergul commented 5 years ago

@torrin47 I think the lookup you were looking for now lives here:

https://github.com/r12a/r12a.github.io/blob/master/apps/subtags/languages.js

torrin47 commented 5 years ago

@jzichichi You're right, I probably shouldn't be editorializing in the requirements! If there was a lot of usage, I could see spending some effort on some lookup functionality that would verify parent identifiers. As it is, since the initial OLEM interest, we've had no new collections, and even a few that were decommissioned. Instead the big recent push is to integrate other unique identifiers (DOIs, ORCIDs, PMCIDs) that don't have a great corresponding element in projectOpenData. Sigh.

torrin47 commented 5 years ago

@aergul Awesome! Except that it's still overkill for our purposes, just like the previous one that's no longer available. Would be really snazzy if somehow the language codes were weighted by number of speakers and we could preload load the top 50, and leave the rest for a deeper query. We might want to run that by the subgroup - do they want to pick a shortlist of languages, or use the full universe, or what?

aergul commented 5 years ago

Like this? https://en.wikipedia.org/wiki/List_of_languages_by_number_of_native_speakers

I suspect top 25 is plenty.

Or complete as you type...

torrin47 commented 5 years ago

We are going to leave Spatial out of the MVP, and in the long term, populate it using "extent" rather than spatial keyword.

torrin47 commented 5 years ago

Because of the technical challenges for isPartOf collections, we're going to omit it from the MVP and address it with a lookup service as an enhancement down the road.

aergul commented 5 years ago

Elements implemented except those omitted.

jzichichi commented 5 years ago

@aergul - I like the way the language list and filter works, but it feels weird to me that we can't see English, but we can see over 10 variations of Arabic. Can we show the short list of entries nearest English, since I suspect that will be 99% of the user selections?

aergul commented 5 years ago

@jzichichi English/Spanish/French floated to top per our discussion

jzichichi commented 5 years ago

@aergul - better. I like how Data Quality looks as well. Moving to EPA column for @torrin47 to review.

torrin47 commented 5 years ago

I find it odd that English has an option with no associated country, and French has (Standard) which seems rather official, but Spanish only has versions with countries. I'm just not sure what I'd tell someone to use for Spanish language data produced and consumed in the United States. (Puerto Rico)? I guess Spain ends up the default.

torrin47 commented 5 years ago

This issue was moved to USEPA/EPA_Non-geo_Metadata_Editor#9