ArctosDB / arctos

Arctos is a museum collections management system
https://arctos.database.museum
Apache License 2.0
59 stars 13 forks source link

Europe, Azerbaijan higher geography #3065

Closed campmlc closed 2 years ago

campmlc commented 3 years ago

Azerbaijan, Georgia, and Armenia are placed in Europe in higher geography, while Turkey is in Asia. Is this valid?

tucotuco commented 3 years ago

Some say, "No".

https://en.wikipedia.org/wiki/Boundaries_between_the_continents_of_Earth#Asia_and_North_America https://en.wikipedia.org/wiki/List_of_transcontinental_countries

On Sat, Aug 29, 2020 at 2:10 PM Mariel Campbell notifications@github.com wrote:

Azerbaijan, Georgia, and Armenia are placed in Europe in higher geography, while Turkey is in Asia. Is this valid?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/3065, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADQ723CTA3Z6JDUBO4TWHTSDEY6RANCNFSM4QPDD7SQ .

dustymc commented 3 years ago

https://github.com/ArctosDB/arctos/issues/1291

Jegelewicz commented 3 years ago

HMMMM In keeping with our "Wikipedia" source in HG, the links provided by @tucotuco seem especially helpful.

Let's look at https://en.wikipedia.org/wiki/Kazakhstan

If we follow the Wiki article, then most of the country will be in Asia with 2 regions, West Kazakhstan and Atyrau extending into Europe.

Right now, all of Kazakhstan in Arctos is in the Continent of "Eurasia", so people searching Europe and Asia will find none of it. If we follow the Wikipedia rule, I think our specimens in Kazakhstan will be much more discoverable, especially since this is the method the aggregators are using. Also, when searching for stuff in Europe, the results would be consistent with the wiki definition of Europe.

Also, if I can figure this out, I could get the shape files: https://gadm.org/download_country_v3.html

mkoo commented 3 years ago

we have the shp files since we reference wikipedia, i agree with @Jegelewicz I'll add to @bovinealex to do list: kazakhstan up to primary admin boundaries (regions: https://en.wikipedia.org/wiki/Regions_of_Kazakhstan)

bovinealex commented 3 years ago

should I add wkts for the regions, or just change the continent?

Jegelewicz commented 3 years ago

should I add wkts for the regions

I vote yes!

dustymc commented 3 years ago

Always enthusiastic "yes" from me - https://github.com/ArctosDB/arctos/issues/1795

campmlc commented 3 years ago

Yes, thanks!!

On Tue, Sep 1, 2020 at 3:14 PM dustymc notifications@github.com wrote:

  • [EXTERNAL]*

Always enthusiastic "yes" from me - #1795 https://github.com/ArctosDB/arctos/issues/1795

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/3065#issuecomment-685136169, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADQ7JBEV3U54JTMPPMP2N3LSDVP37ANCNFSM4QPDD7SQ .

bovinealex commented 3 years ago

Added WKTs for all the regions that currently have specimens attached to them: https://arctos.database.museum/Locality.cfm?Action=editGeog&geog_auth_rec_id=10004581 https://arctos.database.museum/Locality.cfm?Action=editGeog&geog_auth_rec_id=10010144 https://arctos.database.museum/Locality.cfm?Action=editGeog&geog_auth_rec_id=10004582 https://arctos.database.museum/Locality.cfm?Action=editGeog&geog_auth_rec_id=10015461

Jegelewicz commented 3 years ago

save for the country itself which I left in Eurasia

This means that anything assigned to the country will not show up in a search of either Europe or Asia. @dustymc any wild ideas about how to treat this?

dustymc commented 3 years ago

wild ideas

Move it to North America, with Hawaii and Greenland?

https://github.com/ArctosDB/arctos/issues/1291#issuecomment-678341940

Jegelewicz commented 3 years ago

https://github.com/ArctosDB/arctos/issues/1291#issuecomment-685029601

tucotuco commented 3 years ago

Hawaii in North America? {face plant}

On Tue, Sep 1, 2020 at 8:15 PM Teresa Mayfield-Meyer < notifications@github.com> wrote:

1291 (comment)

https://github.com/ArctosDB/arctos/issues/1291#issuecomment-685029601

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/3065#issuecomment-685182330, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADQ722FQZ3GKTUH5G2HIATSDV6ANANCNFSM4QPDD7SQ .

Jegelewicz commented 3 years ago

Hawaii in North America? {face plant}

So, what SHOULD continent be for Hawaii? NULL?

tucotuco commented 3 years ago

Oceania without a doubt.

On Wed, Sep 2, 2020 at 10:49 AM Teresa Mayfield-Meyer < notifications@github.com> wrote:

Hawaii in North America? {face plant}

So, what SHOULD continent be for Hawaii? NULL?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/3065#issuecomment-685750172, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADQ72YWTF76BAZ5DZXZGA3SDZEP7ANCNFSM4QPDD7SQ .

sharpphyl commented 3 years ago

Oceania without a doubt.

Probably. Even TGN isn't consistent.

Here's Hawaiian Islands in Oceania:

Screen Shot 2020-09-02 at 9 24 57 AM

Here's the state of Hawaii in North and Central America:

Screen Shot 2020-09-02 at 9 24 29 AM

Is that a political vs. geographical distinction?

Wikipedia excludes Hawaii from Oceania. https://en.wikipedia.org/wiki/Oceania. GBIF which puts it in North America. iDigBio seems to accept any continent or ocean you choose.

Oceania or North America would at least put it in a continent.

Screen Shot 2020-09-02 at 9 29 52 AM
dustymc commented 3 years ago

Central America

War it is then. Unless they'll take Greenland off our hands....

political vs. geographical distinction

Maybe, but it just looks like denormalization from here.

Wikipedia excludes Hawaii

That's not how I read this:

Definitions of Oceania vary; however, the islands at the geographic extremes of Oceania are generally considered to be the Bonin Islands, a politically integral part of Japan; Hawaii, a state of the United States; Clipperton Island, a possession of France; the Juan Fernández Islands, belonging to Chile; and Macquarie Island, belonging to Australia.[citation needed] (The United Nations has its own geopolitical definition of Oceania, but this consists of discrete political entities, and so excludes the Bonin Islands, Hawaii, Clipperton Island, and the Juan Fernández Islands, along with Easter Island.)[17]

GBIF which puts it in North America.

Then GBIF is obviously broken. Their API does not make such assertions - https://api.gbif-uat.org/v1/geocode/reverse?lat=19.824266&lng=-155.460067

sharpphyl commented 3 years ago

Then GBIF is obviously broken. Their API does not make such assertions - https://api.gbif-uat.org/v1/geocode/reverse?lat=19.824266&lng=-155.460067

I'm still learning how to read these things but I don't see an ocean or continent in this. Just a political entity (US). Can you help me understand what I'm looking at.

That's not how I read this: Definitions of Oceania vary...

Yep, I missed that. So are you suggesting we put Hawaii in Oceania?

dustymc commented 3 years ago

understand what I'm looking at.

I can't find the documentation, if it exists, so I'm not sure either, but it looks like a GBIF "get geography for point" API. They seem to be returning no continent-level information at all for Hawaii, and their UI doesn't seem to line up with the data here, so who knows....

suggesting we put Hawaii in Oceania

I'm not making any specific suggestions. I'd like to suggest "do what everyone else does" but there seems to be no "everyone." Hawaii-->Stuff nowhere near Hawaii is just wrong and I'll continue to oppose that. Hawaii-->Oceania seems defensible; I could live with it.

tucotuco commented 3 years ago

Then GBIF is obviously broken. Their API does not make such assertions - https://api.gbif-uat.org/v1/geocode/reverse?lat=19.824266&lng=-155.460067

I'm still learning how to read these things but I don't see an ocean or continent in this. Just a political entity (US). Can you help me understand what I'm looking at.

That API is not officially in production (hence the URL) and not documented yet. GBIF uses a few geographic authorities (GADM, Natural Earth, Marine Regions SeaVox). They do not use an authority to interpret continent yet. That is one of the reasons for this issue. You will not get continents back from the reverse geocoder (yet). The API, and the VertNet geography principles, return the geography of the point (or textual geography) supplied. As such, the example is on land, not in an ocean, so an ocean will not be in the results. The Hawaiian Islands are in an ocean, but that is not what our geography is about, it's about the location of biodiversity-related events, where ocean or not usually makes a big difference. Try a point in the ocean near a shore and you should get ocean and EEZ entries in the results.

Jegelewicz commented 3 years ago

UGH. @tucotuco is backward engineering what we are trying to do the other way around. We just need some consistent way of doing things in Arctos so that whether we are right or wrong, we can at least define our process.

Right now, we are doing islands in multiple ways. Should islands be part of a continent or an ocean?

image

We are not even consistent within the UK

image

sharpphyl commented 3 years ago

Hawaii-->Oceania seems defensible; I could live with it.

I'm sure we need lots of people on board just to move Hawaii, but adding Oceania as a continent would also raise the question of moving everything (let's say whatever TGN or Wikipedia include) into Oceania such as Australia, NZ, etc. I'm very comfortable with that, but I'm looking at it mostly from a marine perspective. While the DMNS:Inv collection has the most records from the Hawaiian Islands, New Zealand and Fiji, we do not have the most from Australia, Samoa, etc.

Here are the biggest users of the Hawaiian Islands.

Screen Shot 2020-09-02 at 10 44 31 AM

Do we need to move this to a new issue so others find it since it's hiding inside Europe Azerbaijan higher geography?

dustymc commented 3 years ago

Should islands be part of a continent

Depends on the island. This looks pretty continental to me:

Screen Shot 2020-09-02 at 9 58 08 AM

This, not so much - depending on what exactly we think a continent is and precisely how far Oceania stretches, perhaps.

Screen Shot 2020-09-02 at 9 59 15 AM

or an ocean?

I believe @tucotuco was suggesting an "oceans are wet" approach somewhere that I can't find at the moment. Seems worth exploring, but I'm not at all convinced it'll hold up to reality. Under that, I believe Pitcairn itself would not be "Pacific," even though there's a lot of that around. Did I make that up or completely misunderstand something?

Oceania as a continent

That plus the "oceans are wet" model might mean the clams from near AU are not in Oceania.

new issue

I have no idea. We have a ton of geography issues, I'm not thrilled with having one more, the contents and subject of this departed ways a while back, ???????????????????

tucotuco commented 3 years ago

Right now, we are doing islands in multiple ways. Should islands be part of a continent or an ocean?

It looks like you are also taking the political rather than the (bio)geographic perspective. South Georgia as part of the United Kingdom? Do you really care more about who claims them than where they are? South Georgia and the South Sandwich Islands have their own ISO country code so that you don't have to do that sort of thing. And DON'T EVEN talk to Argentines about the Falklands - they aren't even called that! But they do have an ISO country code to avoid some of the issues. The only problematic island left from the ISO/geography perspective is Clipperton, which, poor thing, doesn't have a code.

dustymc commented 3 years ago

Do you really care more about who claims them

Some - the folks who issue permits, for example - probably do.

Maybe we need a drastically more complex geography model in which we somehow separate or categorize physical, political, and whatever continents and such might be, entities.

sharpphyl commented 3 years ago

Do we like this view of Oceania? Not sure anyone but Wikipedia uses it. They had to stretch to get Hawaii but just made it!

Screen Shot 2020-09-02 at 11 31 10 AM
tucotuco commented 3 years ago

That map is not consistent with what the article says. I would use the verbiage instead...

"Definitions of Oceania vary; however, the islands at the geographic extremes of Oceania are generally considered to be the Bonin Islands, a politically integral part of Japan; Hawaii, a state of the United States; Clipperton Island, a possession of France; the Juan Fernández Islands, belonging to Chile; and Macquarie Island, belonging to Australia.[citation needed] (The United Nations has its own geopolitical definition of Oceania, but this consists of discrete political entities, and so excludes the Bonin Islands, Hawaii, Clipperton Island, and the Juan Fernández Islands, along with Easter Island.)[17]"

tucotuco commented 3 years ago

I believe @tucotuco was suggesting an "oceans are wet" approach somewhere that I can't find at the moment. Seems worth exploring, but I'm not at all convinced it'll hold up to reality. Under that, I believe Pitcairn itself would not be "Pacific," even though there's a lot of that around. Did I make that up or completely misunderstand something?

Yes, that is what @tucotuco was suggesting. On Pitcairn would not be "Pacific Ocean", it would be Pitcairn Islands, Pitcairn (country code PN).

I suspect that one of the things you want to enable really has nothing to do with the issue of describing the place faithfully in terms of geography - to get everything that is on land (or marine) inside an area described by a polygon regardless of what anyone wants to call it. We could do that a lot better if we a) had good georeferences and b) used them to assign biomes (see https://github.com/tdwg/dwc/issues/38), where biomes followed ENVO so that we could be as specific, or not, as we want.

Jegelewicz commented 3 years ago

I suspect that one of the things you want to enable really has nothing to do with the issue of describing the place faithfully in terms of geography - to get everything that is on land (or marine) inside an area described by a polygon regardless of what anyone wants to call it.

I think you suspect correctly and even without points and polygons, not assigning stuff collected on land to an ocean "continent" would be get us a bit closer to that than we are now. Rather than arguing over what "continent" Hawaii is on, maybe we should just not assign it one. Aggregators will assign it for us and will probably have an easier task of deciding whether the collected object was from a marine environment if we exclude an ocean name from stuff collected on land.

tucotuco commented 3 years ago

Rather than arguing over what "continent" Hawaii is on, maybe we should just not assign it one. Aggregators will assign it for us

Currently, no. They'll see the blank continent field and interpret it as null. They use a code table to look up values coming from the wild and try to assign them to one of the seven continent model - based only on that wild value.

will probably have an easier task of deciding whether the collected object was from a marine environment if we exclude an ocean name from stuff collected on land.

I don't think that will have any effect when the time comes to try to interpret biome from the raw data. They might use the coordinates to do a course assignment, or they might do it by species, or they might only do it with values provided by the data publishers. Hard to know, all of those are fraught with issues.

Jegelewicz commented 3 years ago
Rather than arguing over what "continent" Hawaii is on, maybe we should just not assign it one. Aggregators will assign it for us

Currently, no. They'll see the blank continent field and interpret it as null. They use a code table to look up values coming from the wild and try to assign them to one of the seven continent model - based only on that wild value.

Yes, and then our stuff collected on Hawaii will show up in searches on GBIF along with everything else on the continent GBIF has assigned Hawaii to - right now that doesn't happen.

will probably have an easier task of deciding whether the collected object was from a marine environment if we exclude an ocean name from stuff collected on land.

I don't think that will have any effect when the time comes to try to interpret biome from the raw data. They might use the coordinates to do a course assignment, or they might do it by species, or they might only do it with values provided by the data publishers. Hard to know, all of those are fraught with issues.

I am not talking about "them" here, I am talking about "Arctos". Right now in Arctos, it does appear as if anything collected on land in Hawaii was also collected in the "Pacific Ocean". While this might be somehow true, we (In Arctos) have no way of easily finding all marine collections for UNGEOREFERENCED stuff. I love that everyone wants to rely on coordinates, but we have an awful lot of stuff without them, including our higher geography.

tucotuco commented 3 years ago

Yes, and then our stuff collected on Hawaii will show up in searches on GBIF along with everything else on the continent GBIF has assigned Hawaii to - right now that doesn't happen.

No, it doesn't work that way. GBIF doesn't assign continents to countries (thankfully). They take the value of continent you give them and interpret that if they can. Thus, if you leave continent blank for Hawaiian locations, they will not show up in any continent search.

While this might be somehow true, we (In Arctos) have no way of easily finding all marine collections for UNGEOREFERENCED stuff.

Then I recommend a biome field to make that explicit. I am pretty sure that should not be at the higher geography level. According to the thinking that brought the proposal for the term to Darwin Core, is should be at the Event level. It would probably mean that you have Events already that would have to be split based on this attribute. I don't know how much you could set semi-automatically, but I suspect a great deal.

Jegelewicz commented 3 years ago

No, it doesn't work that way. GBIF doesn't assign continents to countries (thankfully). They take the value of continent you give them and interpret that if they can. Thus, if you leave continent blank for Hawaiian locations, they will not show up in any continent search.

So, unless we use the 7 continents that GBIF approves of, our records will never show up appropriately in GBIF searches. That seems like a less than good way for GBIF to handle stuff.

Putting this link here, because it is related and it took me forever to uncover it. https://github.com/tdwg/dwc-qa/issues/128#issuecomment-661161433

tucotuco commented 3 years ago

So, unless we use the 7 continents that GBIF approves of, our records will never show up appropriately in GBIF searches. That seems like a less than good way for GBIF to handle stuff.

That is correct. That is why I submitted this to GBIF: https://github.com/gbif/parsers/issues/26.

dustymc commented 3 years ago

describing the place faithfully get everything that is

Yes, and those obviously aren't quite the same thing.

good georeferences not assigning stuff collected on land to an ocean "continent"

Good geography helps determine coordinates, those georeferences help find faulty geography assertions.

biomes

Sounds fun.

Aggregators will assign it for us

That's very optimistic of you!

blank continent field and interpret it as null.

And I think most services would see a blank continent and interpret it as "there's a huge obvious chunk of data missing, let's make some error-indicator a lot bigger." Just like any defensible service would likely interpret "North America, Hawaii" as "this can't possibly be anything but garbage, I'm not touching it, next please."

they might do it by species

I think that would be fatal; critters really do find themselves in strange places from time to time, and there's science in that. ('This parking lot is killing a LOT of salamanders...') A 'something weird' flag hinged on species would be cool though.

show up in searches on GBIF

Trying to game a system by manipulating data in indefensible ways is never going to end well.

have no way of easily finding all marine collections for UNGEOREFERENCED stuff.

And we never will; it's hard to overstate the importance of coordinates. Even if they're not very reliable - and I have those waiting in the background - they're a critical part in finding that the geography used to produce them is wonky.

awful lot of stuff without them

Awful lot of stuff with them....


arctosprod@arctos>> select (select count(*) from locality where s$dec_lat is not null)::decimal/(select count(*) from locality)::decimal;
        ?column?        
------------------------
 0.97542175460510998869

biome ... not be at the higher geography level.

Absolutely - collecting_event_attribute (place + time).

records will never show up appropriately in GBIF searches

I don't think GBIF believes they've solved everything and are at an end-point; they're struggling with the same problems we are, and this all gets easier for everyone if we can work together to develop a model that's suitable for everyone. The linked issue looks like a step in that direction.

dustymc commented 2 years ago

Tabling - these never get action, https://github.com/ArctosDB/arctos/issues/3272 would allow Arctos to standardize geography.