ArctosDB / arctos

Arctos is a museum collections management system
https://arctos.database.museum
60 stars 13 forks source link

Geography request for BELL Musuem #5118

Closed Jegelewicz closed 1 year ago

Jegelewicz commented 1 year ago
    @Jegelewicz here is a CSV of our suggested additions to geography. I'm not sure how Arctos is handling Antarctica, but the islands we are adding don't pop up on a search. You might need to adjust if the geography tables aren't recognizing territorial claims. There are also a couple of other notes I included you might take a quick read through. Thanks!

geography_additions.csv

Originally posted by @barke042 in https://github.com/ArctosDB/data-migration/issues/1138#issuecomment-1262523704

Jegelewicz commented 1 year ago

A lot of the above are islands - what's the current status of islands?

dustymc commented 1 year ago

I've got data for ~15K islands, but it's not mixed up with all that other stuff (and won't be, if I have any say in the matter) - it's just island and plate (which I might semi-reluctantly agree to map to continent, if I have to).

In general, I think at least big isolated islands have to exist in some manner - Saint Lawrence for example is county-sized and just doesn't make sense under any geography other than its own. Islands encompassed by some state-level political division (I suppose this one is too, but it's a big state and the island isn't very close to anything else...) I'd prefer to push to GADM. I'm sure there's lots in the middle, a Community decision is needed.

"Indonesia alone" is Asia, Indonesia, Indonesian Archipelago (and another great example of why I want https://github.com/ArctosDB/arctos/issues/5076, "Asia" and "Indonesian Archipelago" are just distractions that decrease discoverability).

IDK where Europe, Greece, Peloponnese, Messenia came from but that's not how GADM sees things - yet another instance where https://github.com/ArctosDB/arctos/issues/5076 would make an impossible problem (we have lots of things that don't follow GADM, why not one more?) not a problem at all (and anyone can check GADM and KNOW what we would or would not accept, which is pretty amazing too).

The idea emerging in https://github.com/ArctosDB/arctos/issues/5109 would make that even simpler - the formal bits would stop at Peloponnese, Western Greece and the Ionian Islands and everyone would now that ahead of time.

Europe, Russia, Siberian Federal District is all kinds of fun - from the supplied link "The entire federal district lies within the continent of Asia." - I'd rather not have continents at all - and that's not how GADM sees the world.

Europe, Russia, Chukotka Autonomous Okrug, Chukotsky District lines up with https://gadm.org/maps/RUS/chukot/chukotskiyrayon.html well enough, but our vocabulary isn't "gadmish" and there's still a continent (and the wrong one) and maybe we're going to drop county-level altogether and I'd rather not make anyone (including me!!) do this multiple times.

I don't know how to adequately respond to any of this, and I hate that. I feel like The Community should be treating this much more like an emergency; that's certainly how I see it. I hope I'm going to get help with this in the very near future, but if that doesn't happen then the least-evil thing is probably to just bypass my safeties, create these as provided, somehow flag them as needing priority cleanup, and hope The Community can make a decision on a less-urgent schedule.

@mkoo @ArctosDB/arctos-working-group-officers PLEASE HELP!

Jegelewicz commented 1 year ago

Islands

I like the idea of treating island as a separate thing. Saint Lawrence Island belongs in Alaska, Nome Census Area. Sure, sometimes an island may be equal to the "administrative higher geography" and sometimes it may not, but the intersection could tell us more than the weird mash up. Darwin Core has a place for this - island. I know that creating these higher geography strings was meant to ensure that one couldn't mash up an island in a nonsensical way with the incorrect "country, state, county", but it's not working anymore.

I really believe that our first step should be removing island, island group and feature from "higher geography" and we should focus "higher geography" on administrative boundaries (so that means removing continent too). We can think about a hierarchy for island groups and islands (ugh) later if we want.

dustymc commented 1 year ago

. Saint Lawrence Island belongs in Alaska, Nome Census Area

I'm never going to have data which puts those things together. Unless Step One is "hire 3 full-time GIS pros" then that just doesn't seem compatible with anything. I do have (or can get, probably) census areas, I do have islands, we can have both, they might intersect in some way, but calculating intersections (or dealing with the infinite number of things that might create) seems well beyond our current means.

(Or maybe you meant move island to locality attribtues or something, which I'm mostly fine with, but that still leaves a big gap in geography - the census area is about the size of South Carolina, and only because the vast area between the island and mainland isn't considered. I still think islands as geography make sense in certain situations - certainly including many Antarctic islands.)

Darwin Core

...does not see the world spatially. (But that doesn't necessarily stop me from filling in some blanks, asserted or otherwise.) I don't think Darwin Core needs to influence this discussion in any way, but maybe not getting flagged by GBIF is in some way more important than creating functional data - and if that's the case then I'm back to "moving "curatorially-asserted geography" much more verbatim-ward."

removing

I don't want to get any horses ahead of carts, just agreeing on how to use (or not use) the existing structure seems sufficient for the moment.

focus "higher geography" on administrative boundaries

That's basically https://github.com/ArctosDB/arctos/issues/5076, but it also allows WHATEVER boundaries (IHO and SeaVox are immediately available and not covered by things like GADM, for example).

Jegelewicz commented 1 year ago

I'm never going to have data which puts those things together.

And you shouldn't need to! Collections will pick them separately - so yes perhaps island is a locality attribute, but one that can have spatial stuff too?

I still think islands as geography make sense in certain situations - certainly including many Antarctic islands.)

As long as we can get shapes for them, they ALWAYS do! BUT we should be able to pick the administrative shape AND the island shape - their overlap narrows the space, and the terms help humans locate stuff without the shapes.

dustymc commented 1 year ago

locality attribute, but one that can have spatial stuff too?

That's not compatible with the structure.

we should be able to pick the administrative shape AND the island shape

Sure, you can assert as many locality stacks as you want, and each of them can have two shapes (geography+locality).

So - maybe we need some prepackaged locality shapes - "Saint Lawrence Island" and "UC Berkeley Campus" and such? I've got something like that (the cache I pull gadm and such into before using it), making it (reusing my thing or something new, doesn't matter to me) selectable would not be difficult, and adding to it would not necessarily be something that needs to go through an "authority' discussion (so no real limits - if you think it belongs then we do too). The only real cost is that it's a lot of storage, but it's not really "data" so I don't think it needs to be considered 'core' so even that can probably be ignored. (Things would get copied to locality.locality_footprint - and maybe modified and such - not referenced.)

Actually that might be functionally equivalent to building and naming some localities which could be copied around as necessary, which could have been done years ago/needs no development at all (other than probably some documentation). In any case, whatever the details and mechanics, we do have a second "non-authority" place for spatial data, and copying spatial data from one place to another in some way isn't difficult.

their overlap narrows the space

If that was true then https://github.com/ArctosDB/arctos/issues/4289 could not possible be controversial. I think this is part of the problem - we (or me, at least) keep bouncing around to whatever viewpoint is convenient at the moment without ever considering the whole problem in detail.

AJLinn commented 1 year ago

FWIW, the preferred higher geographies for the UAM:EH users will always be the geo-political boundaries that real people use - e.g., census areas and boroughs for most of Alaska. This is the stuff our users are going to be most familiar with, and therefore the most useful terms for us to use:

Screen Shot 2022-09-29 at 1 33 01 PM
dustymc commented 1 year ago

FWIW

Quite a bit I think - I'm not gonna fix this without knowing what ya'll are thinking!

I took a peek at your data, there are lots of quads - is that not on purpose? (I'm becoming convinced that users often don't pick what they intend - maybe because there are an ovewhelming number of options, or putting all the intersections together in their head is impossible, or nobody really cares, or ?????????? Whatever the cause, one option would be to just let computers worry about it.)

census areas and boroughs

That lines up nicely with gadm and therefore https://github.com/ArctosDB/arctos/issues/5076.

krgomez commented 1 year ago

FWIW, the preferred higher geographies for the UAM:EH users will always be the geo-political boundaries that real people use - e.g., census areas and boroughs for most of Alaska.

The same holds true for the UAM:Art collection.

dustymc commented 1 year ago

@krgomez thanks and same question - are all those quads in your data intentional?

AJLinn commented 1 year ago

I took a peek at your data, there are lots of quads - is that not on purpose? (I'm becoming convinced that users often don't pick what they intend - maybe because there are an ovewhelming number of options,

It's true, we have probably selected quads when that seems more precise or when Arctos tells us with a little error reporter that people aren't finding our records because there's a more appropriate higher geography available. We've been experimenting with these as we've become more comfortable with the various tools and what makes most sense for our collections and users.

The overwhelming number of options is certainly one of them, but more often I think it's the public shaming by Arctos telling us our data is not up to its standards ("HAL: I know I've made some very poor decisions recently, but I can give you my complete assurance that my work will be back to normal. I've still got the greatest enthusiasm and confidence in the mission. And I want to help you.")

dustymc commented 1 year ago

Thanks, helpful.

public shaming by Arctos

I hope that one result of the geography rethink and cleanup will be much better results there and in similar operations.

In https://github.com/ArctosDB/arctos/issues/5076#issuecomment-1253049379 I suggest that we probably need a formal source for geography. If we end up with multiple competing views of the planet (and it's hard to see how we might avoid that, although doing so would be awesome) that might be linked to collections, so your collection could prefer, as suggestions or https://github.com/ArctosDB/arctos/issues/4785, [thing1, then thing2, then, if we must, the layer with quads] or something.

And yea, we've done this before, lots of times. We have what looks like decent data, we get some new technology that lets us see that data in some new light, and we realize our data wasn't so great after all. The result is about always truly better data capable of DOING more, but it's not always smooth getting there.

krgomez commented 1 year ago

They must be, as I added all of the locality data for everything in the art collection. I thought I used mostly boroughs, municipalities, national parks, and census areas when adding creation events for artworks in the collection. I know there were cases where I used USGS quads, and I think it was when an artwork was created in a more remote area where I wasn't sure what else to use.

dustymc commented 1 year ago

Merge-->https://github.com/ArctosDB/arctos/issues/5138