ArctosDB / arctos

Arctos is a museum collections management system
https://arctos.database.museum
59 stars 13 forks source link

geography low-level chronic emergency #2416

Closed dustymc closed 2 years ago

dustymc commented 4 years ago

Refs: https://github.com/ArctosDB/arctos/issues/2346, https://github.com/ArctosDB/arctos/issues/2326

http://arctos.database.museum/geography.cfm?FEATURE=Polynesia was created.

No other data - http://arctos.database.museum/geography.cfm?FEATURE=Hawaii%20Volcanoes%20National%20Park for example - were updated to use this new term. We've just made our data MUCH more difficult to explore by introducing inconsistencies.

Perhaps this is a direction in which we want to go for some reason that doesn't make sense to me - https://github.com/ArctosDB/arctos/issues/1366 remains unanswered.

If that is as undesirable to the community as it seems to me, I suggest we urgently proceed with finding a better system for vetting authority values, or defining geography and clarifying the rules, or locking down geography access like we did after the last major cleanup, or SOMETHING. We are actively un-doing a tremendous amount of work with our inaction in these things.

Jegelewicz commented 4 years ago

I'm not sure I would call this and emergency. I completely understand, and I could certainly take these features down if the community wishes. We have things that are from "Polynesia" which is a well recognized area (with a WKT and everything) in the Pacific Ocean. I can certainly just use Pacific Ocean for these and then use "Polynesia" as specific locality if that seems best. In the example provided above, there would be no way to do this since "Volcanoes National Park" and "Polynesia" are both features - I can only select one or the other. Are you suggesting that "Polynesia" should NEVER be used in Higher Geog or that it should be something other than a feature? Copying others who probably really care at this point. @PaulaBarteau @sharpphyl @mkoo

dustymc commented 4 years ago

not sure I would call this and emergency.

I think we're introducing data that we may (or not) decide to clean up, and that almost always turns into a huge amount of work and frustration. (See: Nature of ID.) There seems to be some unity in being very cautious about changing existing values; I'm not sure that's practical without a similar amount of caution in introducing new values.

Are you suggesting that "Polynesia" should NEVER be used in Higher Geog or that it should be something other than a feature?

I'm suggesting we should be consistent. "Bla" should find everything or nothing from some certain place. A user should not have to search "Polynesia" and "Hawaii" to get things from a particular island; very few will dig deep enough to discover they've only found part of what they might want.

I'm not sure how that lines up with what the community wants, and I don't have strong feelings about the actual values - I'm just trying to make sure we don't reduce the overall discoverability of the data in Arctos. If we are going to be inconsistent, then perhaps we can develop a new "rule" for http://handbook.arctosdb.org/documentation/higher-geography.html#guidelines-for-geographic-terms-in-arctos and at least be consistently inconsistent. (E.g., "regions/areas/whatever this is may be entered in {field} when {whatever}, but never {something else}", which might give users some chance of finding what they're looking for.)

If Polynesia is a feature then there's no place for eg Hawaii Volcanoes National Park while maintaining consistency. If we do want "region" data (Polynesia, Patagonia, Dixie) and we do want to maintain consistency (at least within "hierarchies") then perhaps we need a new concept.

(And we seem to be talking about adding a lot of stuff to geography lately. Is there some end to that in sight? If not, perhaps we should have a more flexible model that doesn't require modifying a table and a bunch of forms to do so.)

mkoo commented 4 years ago

This is the result of us not sitting down and dealing with the non-terrestrial geography too! I think we need to have a separate discussion about that. My initial digging around is that there is not one standard for GIS boundaries. We just need to adopt one and make our own WKT but I feel it should be as a HG not features. What were the specimen localites that initiated this?

On Mon, Dec 16, 2019 at 10:12 AM Teresa Mayfield-Meyer < notifications@github.com> wrote:

I'm not sure I would call this and emergency. I completely understand, and I could certainly take these features down if the community wishes. We have things that are from "Polynesia" which is a well recognized area (with a WKT and everything) in the Pacific Ocean. I can certainly just use Pacific Ocean for these and then use "Polynesia" as specific locality if that seems best. In the example provided above, there would be no way to do this since "Volcanoes National Park" and "Polynesia" are both features - I can only select one or the other. Are you suggesting that "Polynesia" should NEVER be used in Higher Geog or that it should be something other than a feature? Copying others who probably really care at this point. @PaulaBarteau https://github.com/PaulaBarteau @sharpphyl https://github.com/sharpphyl @mkoo https://github.com/mkoo

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/2416?email_source=notifications&email_token=AATH7ULPTYVZPUYQI62BKK3QY7AJXA5CNFSM4J23ENZKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEG7S2XY#issuecomment-566177119, or unsubscribe https://github.com/notifications/unsubscribe-auth/AATH7UOVZ57F2PR5ANZ3SP3QY7AJXANCNFSM4J23ENZA .

dustymc commented 4 years ago

It's not just non-terrestrial - Patagonia (or something like that....) came up recently.

WKT is all kinds of awesome, but it doesn't help users who are searching by this stuff:

Screen Shot 2019-12-16 at 10 47 08 AM
Jegelewicz commented 4 years ago

We just need to adopt one and make our own WKT but I feel it should be as a HG not features.

What @dustymc said above.

What were the specimen localites that initiated this?

New stuff coming in from NMMNH mollusk collection - has not been entered yet, we were just adding higher geography before bulkload.

sharpphyl commented 4 years ago

We have a lot of specimens from countries within "Polynesia" but no need for it as a term in Higher Geography. If we have Polynesia as a "feature," why not Micronesia and Melanesia, all subsections of Oceania? I think using these terms, would complicate our higher geography and searches for specimens.

I would prefer to use the country name within the Pacific Ocean as I think that is a more likely search term. If we had specimens from just "Polynesia" we would use that as the specific locality within the Pacific Ocean.

Also "Polynesia," in our limited experience, has actually referred to "French Polynesia, " not the larger geographic term.

Wikipedia refers to Polynesia as a "cultural term" even though it has geographical boundaries. https://en.wikipedia.org/wiki/Polynesia

Jegelewicz commented 4 years ago

Polynesia as a "feature," why not Micronesia and Melanesia, all subsections of Oceania?

All three were created.

image

Jegelewicz commented 4 years ago

I would prefer to use the country name within the Pacific Ocean as I think that is a more likely search term. If we had specimens from just "Polynesia" we would use that as the specific locality within the Pacific Ocean.

That seems to be the consensus - I will remove these features and make that recommendation to NMMNH:Inv

dustymc commented 4 years ago

why not Micronesia and Melanesia

Why not "The lowlands of south-south-eastern Mono County, California"?

I maintain we need some sort of policy, whatever that policy is. I'm reopening, but perhaps this can be deprioritized.

From a different Issue

Is any group dealing with marine geography? It seems it's been an issue for years.

I don't think this is a marine problem; marine stuff probably needs more work, might need new fields/concepts, but I don't think "when exceptionally wet, ..." is useful policy.

This is why I have suggested we look at things like Getty Thesaurus - however, it changes too so it wouldn't really solve the problem of updates and change.

There are different ways to implement. One would be to store only their identifier, which would require a great deal of trust but would solve this.

Until someone creates the ultimate geography thesaurus, this is always going to be an issue.

There are lots of alternatives. I mostly care about this stuff because we search by asserted terms; if we remove ourselves from that model (eg by pulling terms from coordinates) then I think the landscape would look very different, for example. It's worth examining how we got to this model and where else we might be before we lock ourselves into any particular viewpoints.

mkoo commented 4 years ago

I am wading into this because it will be on the Issues Agenda for Feb 6.

To me this issue, and several before it, boils down to having clear criteria for Geographic Features, Higher Geography, and entries for each with spatial parameters.

Let's start with: What's Higher Geography? We tried to set out an ideal vision for it (not saying it's what we're doing currently), but in a nutshell we could say HG is an administrative boundary, recognized by various global authorities (eg GADM, ISO) and its subunits.

So Features are not? Can Features transverse administrative boundaries? Are they recognized by some political/ admin authorities (like protected areas)? Can we use marine descriptors here (like Bay of Bengal, Sea of Cortez, Pacific Ocean)? Ideally I dont think Features should be something we substitute in the HG string. It should be at best an emergent property from a record's spatial data. Where does that leave our favorite Feature, the topoquad? Well we can make exceptions and elevate to HG since they are substitutes for HG in AK at least. Those we can create an HG entry and assign wkt's etc.

Features should make it easier to find stuff so we still want to make them part of the spatial data we store and use as bounding objects.

So it seems we agree on things like "French Polynesia" is a HG but not "Polynesia"; and whether we need an oceanic Feature for this region to find this it would be another discussion. Let's discuss!

dustymc commented 4 years ago

administrative boundary, recognized by various global authorities (eg GADM, ISO) and its subunits.

I like it, with some limitations - Arctos only(ish) deals with three levels, country/state/county, where some countries have a bunch more "sub-state" things.

Can Features transverse administrative boundaries?

The existence of Yellowstone National Park suggests they must.

Bay of Bengal,

I think I'm OK with that IF and only if there's a boundary (eg, WKT). We should be able to say "this is [not] in Bay of Bengal" - it shouldn't sort of fade out somewhere before Antarctica.

Sea of Cortez

That's a Sea.

Pacific Ocean

That's Ocean

topoquad

That's Quad.

"French Polynesia" is a HG

Probably has to be since it's a political entity, but it's still ugly...

Screen Shot 2020-01-30 at 3 43 45 PM

not "Polynesia"

I agree, I don't like vague regions mixed in what should be more or less "formal" data.

emergent property from a record's spatial data

One possible model would be to have two sets of geography - one asserted by the collections (which might have relaxed rules) and another pulled from the spatial data (which would probably find ways to be weird, but it would also be consistent and therefore searchable).

Jegelewicz commented 4 years ago

It seems to me that if every HG had a polygon AND all of our localities had coordinates, the problem would be solved (whatever polygon you search selects all records with even part of an error inside the polygon, even if they have a different HG). Easy-peasy! Now who is going to create all HG polygons, georeference all localities and do we have the computing power to facilitate those searches?

Or am I nuts and this wouldn't work at all?

dustymc commented 4 years ago

every HG had a polygon

That would be AMAZING.

all of our localities had coordinates

Settle for "almost"?

UAM@ARCTOS> select count(*) from locality;

  COUNT(*)
----------
    602014

1 row selected.

Elapsed: 00:00:00.50
UAM@ARCTOS> select count(*) from locality where dec_lat is null;

  COUNT(*)
----------
    155502

1 row selected.

Elapsed: 00:00:00.39
UAM@ARCTOS> select count(*) from locality where s$dec_lat is null;

  COUNT(*)
----------
     14947

search

I think it's possible to do more than intersects - within, xx% within, etc.

create all HG polygons

GADM (and if you can't easily find a polygon, it ain't geography?!)?

georeference all most localities

Done.

computing power

I don't know, but it seems realistic under PG at TACC.

mkoo commented 4 years ago

Still wouldnt solve the random (or very specific ) user who wants to search "Polynesia' for any gastropods or everything from Kruger National Park-- Features would be good to implement in some way, just not the way we're doing it now.

This is a little akin to the wishlist item for better polygon searching in Berkeley Mapper or somewhere on Arctos.

Other metrics of spatially defined HG and everything georeferenced is what we're all about so not a long shot at all! (in fact, that's what we've been doing for 20 years!) OK more discussion next week

On Thu, Jan 30, 2020 at 4:44 PM dustymc notifications@github.com wrote:

every HG had a polygon

That would be AMAZING.

all of our localities had coordinates

Settle for "almost"?

UAM@ARCTOS> select count(*) from locality;

COUNT(*)

602014

1 row selected.

Elapsed: 00:00:00.50 UAM@ARCTOS> select count(*) from locality where dec_lat is null;

COUNT(*)

155502

1 row selected.

Elapsed: 00:00:00.39 UAM@ARCTOS> select count(*) from locality where s$dec_lat is null;

COUNT(*)

 14947

search

I think it's possible to do more than intersects - within, xx% within, etc.

create all HG polygons

GADM (and if you can't easily find a polygon, it ain't geography?!)?

georeference all most localities

Done.

computing power

I don't know, but it seems realistic under PG at TACC.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/2416?email_source=notifications&email_token=AATH7UL7KOHEMJHA7P4I633RANX6LA5CNFSM4J23ENZKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKNC2SI#issuecomment-580529481, or unsubscribe https://github.com/notifications/unsubscribe-auth/AATH7UKAUCIHCHBUQRQYZUDRANX6LANCNFSM4J23ENZA .

dustymc commented 4 years ago

search "Polynesia'

I'm not sure that's ever come up, and even our super-primitive spatial search can get close-ish at that scale.

Kruger

I see no reason that wouldn't be a Feature.

Jegelewicz commented 4 years ago

if you can't easily find a polygon, it ain't geography?

Agree

Jegelewicz commented 4 years ago

"Polynesia' for any gastropods or everything from Kruger National Park-

Both of these could have polygons, so I see no reason they couldn't be HG - in my scenario HG isn't applied to localities - just coordinates and error. HG are the search terms only.

Jegelewicz commented 4 years ago

Also, in my scenario, there is no such thing as "feature", just some set of terms and a polygon = HG.

dustymc commented 4 years ago

no I see no reason they couldn't be HG

In the current model, there's no place for that which doesn't introduce inconsistencies.

just some set of terms and a polygon = HG

You can do that in Locality now - why have higher geography at all?

Jegelewicz commented 4 years ago

In the current model,

Agree, but I'm not talking about the current model.

why have higher geography at all?

To facilitate search. You search by HG terms. If I search "France", Arctos searches all HG that include the term "France" and returns all records with coordinates inside any of those polygons. Even better, Arctos asks me if I want France the country or France the province, in which case, we will still need the terms as we have now, with some adjustment. I suggest we use more generic terms "Administrative division 1" so that we can get more detailed than quad in the search terms if we want. (I search "Albuquerque" the city and get everything collected in the city limits.

dustymc commented 4 years ago

facilitate search.

You can do that now without leaving locality as well - it's why I'm so reluctant to give up the 'any' search fields.

 select locality.locality_id from locality,geog_auth_rec where locality.geog_auth_rec_id=geog_auth_rec.geog_auth_rec_id and
 higher_geog not like '%Yosemite%' and upper(S$GEOGRAPHY) like '%YOSEMITE%';

...
1324 rows selected.

https://arctos.database.museum/SpecimenResults.cfm?locality_id=10436323

more generic terms "Administrative division 1"

yes, somehow avoid the whole "but lower sloblovia calls THESE counties!" thing.

more detailed than quad in the search terms if we want. (I search "Albuquerque" the city and get everything collected in the city limits.

I'm again not sure I'm seeing the separation between formal and informal data in what you're saying - it's (potentially) formal all the way down, why stuff that in a model which structurally demands a separation?

I definitely don't see any reason to build a shapefile-metadata model (I think that's what you're proposing?) that's anything other than KVP, which I suppose doesn't precisely allow, but also can't exclude, "city of ABQ" (or "The lowlands of south-south-eastern Mono County, California").

I've been doing most of this for some time, the tools largely exist, I use it a lot for my purposes, it's more or less technically-viable, it probably comes with political sensitivities.

Jegelewicz commented 4 years ago

'any' search fields

Always times out so isn't useful at all.