ArctosDB / arctos

Arctos is a museum collections management system
https://arctos.database.museum
Apache License 2.0
59 stars 13 forks source link

iDigBio Flags on Continent #1291

Closed Jegelewicz closed 2 years ago

Jegelewicz commented 6 years ago

As my data was recently ingested by iDigBio, I received a huge list of specimens flagged for various corrections (sigh). I wanted to bring this one to the group to see if we should be paying more attention to Darwin Core, or if it is just something to let iDigBio keep "correcting" for.

Some of my specimens on islands in the Pacific, are flagged by iDigBio with "dwc_continent_replaced | Darwin Core Continent Corrected." one example is here: https://www.idigbio.org/portal/records/89015b8e-d745-430c-b846-8b250b62afcb

Is Arctos not complying with Darwin Core or is this just an artifact of iDigBio? Do we need to do anything about it or do I just need to know that these flags are not a problem? My main concern is that users of iDigBio will view our data as less reliable with flags attached.

dustymc commented 6 years ago

@DerekSikes noticed something similar in GBIF regarding dates.

Darwin Core is an exchange standard; Arctos isn't "complying" with any data standards because none exist.

I agree with your assessment: User's initial reaction to the flag will be "Arctos is broken," which is absolutely not the case.

ekrimmel commented 6 years ago

We've done a bit of thinking about this internally. Right now there are some data quality flags from iDigBio that are useful because they correct objectively incorrect data, like a mismatch between coordinates and country due to a missing sign. Others, like your Pacific islands example, are subjective to how data are stored in Arctos vs. other models. Many of the objective DQ tests would flag errors that we don't have because Arctos/Dusty also catches them (e.g. "April 31st is a date that doesn't exist"). The subjective ones I don't think are worth our time to care about at this point, in particular because the DQ tests and methods iDigBio uses are in flux due to work being done in TDWG.

The TDWG Biodiversity Data Quality task group has a few factions working on different aspects. One is trying to define a framework for what we even mean when we talk about data quality as a collections community. Another is getting all the aggregators, including iDigBio, to settle on a set of the same data quality tests to run on provider data and return flags for.

I don't actually think the flags are visibly negative enough to make users think "Arctos is broken." I would hope (although I guess hope is the operative word here), that people who are running analyses on or otherwise using aggregator data for something beyond browsing would notice that the flags are doing more standardizing than correcting, and that obviously different collections/databases use different but equally correct ways to say the same thing...

Jegelewicz commented 5 years ago

I think I know what is going on here now and it would be a change to higher geography. While at SPNHC, Robert Mesibov offered to review some Arctos data for me. He downloaded the MSB fish data from iDigBio and reviewed the RAW file. One of the issues he found was that all of the stuff coming from oceans had no water body and instead the body of water was in the DWC_Continent field.

In Darwin Core, Atlantic Ocean is a body of water, not a continent.

I thought that it would make sense to call the tectonic plate the "continent", but that isn't how iDigBio does it. They use political boundaries for continent.

So DMNS:Bird:18967 in Arctos shows a continent of "Atlantic Ocean" in Arctos and no associated water body.

and DMNS:Bird:18967 in iDigBio shows a continent of "Europe" and has the flag DWC Continent Replaced.

Strictly speaking, we are both wrong but I doubt that anyone searching in iDigBio for Europe wants stuff from the South Georgia Islands. And when I search iDigBio for insitution code "DMNS" plus water body "Atlantic Ocean" I get no results. At least anyone searching Arctos for stuff from the Continent/Ocean field for "Atlantic Ocean" will find this specimen (I tried it and it worked!).

All this being said. It seems to me that there needs to be a wider community discussion about Continent and Bodies of Water but in the interest of making our stuff more searchable in iDigBio (and GBIF I'm betting), I suggest that we add Water Body to higher geography and for anything with a continent that is really a water body we add the correct name to the water body field. iDig Bio will still replace our "Continent/Ocean" information, but the correct water body will get there, so people searching the oceans will find our stuff.

Jegelewicz commented 5 years ago

BTW, I added the whole continent/ocean issue to the TDWG data quality GitHub.

Darwin Core Continent and Water Body

dustymc commented 5 years ago

This is an aggregator doing something indefensible (which you've explicitly permitted by licensing your data CC0). This isn't an Arctos issue (there is no standard of which I'm aware), and it's not a DWC issue (the data are being properly transported to the aggregators).

There's been a "community discussion" going on for 32 year(this is what TDWG was formed to do) with no resolution. What we NEED is a usable authority. Arctos could become that or plug into something else; both are technically trivial. (What's Kurator using?)

I dislike waterbody. I fail to see how the few miles of sometimes-wet sorta-ditch behind the farm (it's in Getty) is the same sort of data as states and counties.

dustymc commented 5 years ago

woops

Jegelewicz commented 5 years ago

@ArctosDB/geo-group , please read John W's response.

dustymc commented 5 years ago

We could (theoretically - it may push this into 'infrastructure-limited' territory) use a non-DWC vocab and translate. Eg if ya'll really like 'Central America" as a continent then we could push it and North America to 'North and Central America' on export. (Or maybe that's a horrible idea which just ensures that someone finding something in iDigBio can't find it in Arctos and vise-versa.)

if the location itself is not in the water, dwc:waterbody should be left empty, otherwise we end up with some incongruent assertions some day when the semantics become rigorously important.

https://github.com/ArctosDB/arctos/issues/1107 - we regularly violate this principle and seem resistant to stopping that.

Continent: ...suggest The Getty Thesaurus of Geographic Names (TGN) as the source...Oceania...does not include the oceans.

Maybe that's correct and Oceania only refers to the dirt-parts??

dwc:waterbody is a lot more broad than dwc:continent, as it can include everything from a pond to an ocean. Some use it for drainage basin systems

I'd say that's just wrong (and that's why we've added "drainage" and not "waterbody" the the geography table). There's a LOT of stuff in "Cimarron River Drainage" which isn't anywhere near the Cimarron River (or any other water!).

And https://github.com/ArctosDB/arctos/issues/1366 is still unanswered, but I don't think a pond is included within what we generally see as geography. Maybe that's an indication that trying to draw a line between geography and locality is not a useful thing to do.

And I'd like to amend my assertion above: what we NEED is a lookup service which turns shapes into whatever sort of text string anyone might want. (We already have that, but it's not very good, not very structured, and not very exposed - it just supports "any geog" queries, and it does so from points. We also have services to turn strings into coordinates, but that quickly becomes circular - at least sometimes, I'm inclined to support our current model which treats those coordinates as suggestions and relies on a person to accept them as "data.")

Jegelewicz commented 5 years ago

See also https://github.com/tdwg/bdq/issues/172

After looking into this - I have to agree that our current "Higher Geography" is misleading in searches.

DMNS:Bird:18967 provides a good example. Its higher geography is: Atlantic Ocean, United Kingdom, South Georgia & South Sandwich Islands, South Georgia Islands, South Georgia

As John W. points out, an island is not part of the ocean (a water body). iDigBio moves this specimen to: Europe, United Kingdom, South Georgia & South Sandwich Islands, South Georgia Islands, South Georgia because the United Kingdom is in Europe.

If we were following the ISO 3166 codes, we would have a higher geography of: AN GS SGS 239 South Georgia and the South Sandwich Islands (dependent state)

AN = Antarctica GS = South Georgia and the South Sandwich Islands SGS = South Georgia and the South Sandwich Islands 239 = South Georgia and the South Sandwich Islands

Which makes sense if you are searching by continent or country.

ISO 3166 would be far more stable than Wikipedia and we would stop the madness of finding Magellanic Penguins in the United Kingdom (which most certainly happens in Arctos).

dustymc commented 5 years ago

Here's your link - click "requery" on the "show/hide" widget to get a URL. http://arctos.database.museum/SpecimenResults.cfm?scientific_name=Spheniscus%20magellanicus&scientific_name_scope=currentID&scientific_name_match_type=startswith&country=United%20Kingdom

I don't really have a problem with those data - the UK is a political entity, not a place. More on that below...

I dislike ISO codes as they line up with our data; the intent/meaning is drastically different. We record (sometimes...) what was there when the specimen was collected (or georeferenced, or when the label was printed, or ...), ISO codes refer to something else, those don't always have much to do with each other, and we don't have the resources to update our data when something changes. "Yugoslavia" could refer to lots of shapes (https://www.youtube.com/watch?v=Ic5tBXESxl8) while ISO 3166-1:890 is 1) just https://en.wikipedia.org/wiki/Socialist_Federal_Republic_of_Yugoslavia#/media/File:Yugoslavia_1956-1990.svg, and 2) a withdrawn code.

because the United Kingdom is in Europe

One problem is that we (and GBIF, apparently) have a crazy mix of geography and politics in the data, and often no way to tell them apart. The UK is most certainly not (entirely) in Europe, nor does the name have any sort of spatiotemporal stability.

an island is not part of the ocean

That brings up the question of where exactly the island ends and the ocean begins. Mean high tide, the exclusive economic zone (for island nations), some arbitrary point established by some historical event, the place where the collector felt they were no longer close enough to the island to record that, ... ?

I'm not sure there's a One True Method for any of that which involves strings. It's all fairly trivial with georeferences - just ask some service capable of responding with the data you want. Theoretically anyway - hard to say what might happen with this input:

screen shot 2018-09-26 at 9 52 08 am
Jegelewicz commented 3 years ago

See https://github.com/tdwg/dwc-qa/issues/128#issuecomment-661161433

Jegelewicz commented 3 years ago

Taxonomy Committee had a brief discussion about this. People searching at VertNet, GBIF and iDigBio will not find some of Arctos records due to mismatches between the Continents we use in Arctos and those they use (apparently a standard set) see https://github.com/tdwg/dwc-qa/issues/128#issuecomment-661161433.

Although it would be a lot of work, I think we need to review all higher geography that uses an ocean as the "continent". As John W. pointed out, Hawaii is not part of the Pacific Ocean (it is not water) and if we are sticking with political divisions for higher geography, then Hawaii should be part of North America. see also https://github.com/ArctosDB/arctos/issues/1291#issuecomment-424778196.

I also think we should consider how our continents map to those used by the aggregators:

Arctos Aggregators
Africa Africa
Americas
Antarctica Antarctica
Arctic Ocean
Asia Asia
Atlantic Ocean
Australia Oceania
Central America
Eurasia
Europe Europe
Indian Ocean
North America North America
North Atlantic Ocean
North Pacific Ocean
Pacific Ocean
South America South America
South Atlantic Ocean
Southern Ocean
South Pacific Ocean
West Indies

Everything that we have in any of the oceans is likely lost in many searches of aggregators and that could be a lot of things.

Actually, I find our continent/ocean list a bit perplexing...why did we decide to make the West Indies a continent?

The West Indies is a subregion of North America - https://en.wikipedia.org/wiki/West_Indies

How is that any different from "Patagonia"?

dustymc commented 3 years ago

giant mess

Yep, we should fix

Hawaii ... North America.

No.

West Indies

Wat?!

Jegelewicz commented 3 years ago
Hawaii ... North America.

No.

Political divisions be damned?

Then why are we going with Europe, Iceland and not North Atlantic, Iceland or South America, United Kingdom, Falkland Islands, Falkland Islands instead of South Atlantic Ocean, United Kingdom, Falkland Islands, Falkland Islands.

If we want terrestrial things to be found at the aggregator level, terra firma needs to be associated with a continent. If we don't care that anything on an island in the ocean ever gets discovered at the aggregators, then we should just proceed as usual. @ArctosDB/geo-group

mkoo commented 3 years ago

Wow-- continents and oceans Why cant it be just based on plates? Let me see if we can get a meeting @ArctosDB/geo-group on a Thursday 1030-12 opening or we prioritize for our Issues meeting in a few weeks.

I am still looking for good marine polys so part of the same discussion.

On Thu, Aug 20, 2020 at 3:25 PM Teresa Mayfield-Meyer < notifications@github.com> wrote:

Hawaii ... North America.

No.

Political divisions be damned?

Then why are we going with Europe, Iceland and not North Atlantic, Iceland or South America, United Kingdom, Falkland Islands, Falkland Islands instead of South Atlantic Ocean, United Kingdom, Falkland Islands, Falkland Islands.

If we want terrestrial things to be found at the aggregator level, terra firma needs to be associated with a continent. If we don't care that anything on an island in the ocean ever gets discovered at the aggregators, then we should just proceed as usual. @ArctosDB/geo-group https://github.com/orgs/ArctosDB/teams/geo-group

— You are receiving this because you are on a team that was mentioned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/1291#issuecomment-677937551, or unsubscribe https://github.com/notifications/unsubscribe-auth/AATH7UPVOU3GP5MYOBOSFTTSBWPEHANCNFSM4D55CXIQ .

Jegelewicz commented 3 years ago

Why cant it be just based on plates?

That's what I said! See https://github.com/ArctosDB/arctos/issues/1291#issuecomment-423747804

Plus - there are a lot more plates than we want to keep track of....

dustymc commented 3 years ago

Political divisions

North America is a hunk of dirt.

Hawaii is a hunk of dirt (or parts of it are).

There's a noticeable lack of accessible dirt in between them.

aggregator level

I'm not at all convinced that they have anything figured out. (And that's absolutely not an argument that we have anything figured out either!)

plates

Well Siberia is closer to NA than HI is.....

Jegelewicz commented 3 years ago

Europe is a hunk of dirt and Iceland is a hunk of dirt with a noticeable lack of accessible dirt between them.....

tucotuco commented 3 years ago

Similarly with Greenland and North America, 'cept Denmark (mostly in Europe) apparently still owns that despite Trump's best efforts.

On Thu, Aug 20, 2020 at 7:59 PM Teresa Mayfield-Meyer < notifications@github.com> wrote:

Europe is a hunk of dirt and Iceland is a hunk of dirt with a noticeable lack of accessible dirt between them.....

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/1291#issuecomment-677947357, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADQ724RTSVE72KRJZPVN33SBWTF7ANCNFSM4D55CXIQ .

dustymc commented 3 years ago

Greenland and North America

Say it ain't so!

arctosprod@arctos>> select continent_ocean,country from geog_auth_rec group by continent_ocean,country order by continent_ocean,country;
       continent_ocean        |                   country                    
------------------------------+----------------------------------------------
 Africa                       | Algeria
 Africa                       | Angola
 Africa                       | Benin
 Africa                       | Botswana
 Africa                       | Burkina Faso
 Africa                       | Burundi
 Africa                       | Cameroon
 Africa                       | Central African Republic
 Africa                       | Comoros
 Africa                       | Democratic Republic of the Congo
 Africa                       | Djibouti
 Africa                       | Egypt
 Africa                       | Equatorial Guinea
 Africa                       | Eritrea
 Africa                       | Ethiopia
 Africa                       | Gabon
 Africa                       | Gambia
 Africa                       | Ghana
 Africa                       | Guinea
 Africa                       | Guinea-Bissau
 Africa                       | Ivory Coast
 Africa                       | Kenya
 Africa                       | Liberia
 Africa                       | Libya
 Africa                       | Madagascar
 Africa                       | Malacco
 Africa                       | Malawi
 Africa                       | Mali
 Africa                       | Mauritania
 Africa                       | Morocco
 Africa                       | Mozambique
 Africa                       | Namibia
 Africa                       | Niger
 Africa                       | Nigeria
 Africa                       | Republic of the Congo
 Africa                       | Rhodesia
 Africa                       | Rwanda
 Africa                       | Sao Tome and Principe
 Africa                       | Senegal
 Africa                       | Senegambia
 Africa                       | Seychelles
 Africa                       | Sierra Leone
 Africa                       | Somalia
 Africa                       | South Africa
 Africa                       | Spain
 Africa                       | Sudan
 Africa                       | Swaziland
 Africa                       | Tanganyika
 Africa                       | Tanzania
 Africa                       | Togo
 Africa                       | Tunisia
 Africa                       | Uganda
 Africa                       | Western Sahara
 Africa                       | Zaire
 Africa                       | Zambia
 Africa                       | Zimbabwe
 Africa                       | 
 Americas                     | 
 Antarctica                   | France
 Antarctica                   | New Zealand
 Antarctica                   | United Kingdom
 Antarctica                   | 
 Arctic Ocean                 | Canada
 Arctic Ocean                 | 
 Asia                         | Afghanistan
 Asia                         | Bahrain
 Asia                         | Bangladesh
 Asia                         | Bhutan
 Asia                         | Borneo
 Asia                         | Brunei
 Asia                         | Cambodia
 Asia                         | China
 Asia                         | Cyprus
 Asia                         | India
 Asia                         | Indonesia
 Asia                         | Iran
 Asia                         | Iraq
 Asia                         | Israel
 Asia                         | Japan
 Asia                         | Jordan
 Asia                         | Korea
 Asia                         | Kuwait
 Asia                         | Kyrgyzstan
 Asia                         | Laos
 Asia                         | Lebanon
 Asia                         | Malaysia
 Asia                         | Mongolia
 Asia                         | Myanmar
 Asia                         | Nepal
 Asia                         | North Korea
 Asia                         | Oman
 Asia                         | Pakistan
 Asia                         | Palestine
 Asia                         | Philippines
 Asia                         | Qatar
 Asia                         | Russia
 Asia                         | Saudi Arabia
 Asia                         | Singapore
 Asia                         | South Korea
 Asia                         | Soviet Union
 Asia                         | Sri Lanka
 Asia                         | Syria
 Asia                         | Taiwan
 Asia                         | Tajikistan
 Asia                         | Thailand
 Asia                         | Turkey
 Asia                         | Turkmenistan
 Asia                         | United Arab Emirates
 Asia                         | Uzbekistan
 Asia                         | Vietnam
 Asia                         | West Bank
 Asia                         | Yemen
 Asia                         | 
 Atlantic Ocean               | Cape Verde
 Atlantic Ocean               | Italy
 Atlantic Ocean               | Portugal
 Atlantic Ocean               | Spain
 Atlantic Ocean               | United Kingdom
 Atlantic Ocean               | 
 Australia                    | Australia
 Central America              | Belize
 Central America              | Costa Rica
 Central America              | El Salvador
 Central America              | Guatemala
 Central America              | Honduras
 Central America              | Nicaragua
 Central America              | Panama
 Central America              | 
 Eurasia                      | Kazakhstan
 Eurasia                      | Russia
 Eurasia                      | Soviet Union
 Eurasia                      | 
 Europe                       | Abkhazia
 Europe                       | Albania
 Europe                       | Andorra
 Europe                       | Armenia
 Europe                       | Austria
 Europe                       | Azerbaijan
 Europe                       | Belarus
 Europe                       | Belgium
 Europe                       | Bosnia and Herzegovina
 Europe                       | Bulgaria
 Europe                       | Croatia
 Europe                       | Czech Republic
 Europe                       | Denmark
 Europe                       | Estonia
 Europe                       | Finland
 Europe                       | France
 Europe                       | Georgia
 Europe                       | Germany
 Europe                       | Greece
 Europe                       | Holland
 Europe                       | Hungary
 Europe                       | Iceland
 Europe                       | Ireland
 Europe                       | Italy
 Europe                       | Luxembourg
 Europe                       | Macedonia
 Europe                       | Malta
 Europe                       | Moldova
 Europe                       | Monaco
 Europe                       | Montenegro
 Europe                       | Netherlands
 Europe                       | Northern Ireland
 Europe                       | North Macedonia
 Europe                       | Norway
 Europe                       | Poland
 Europe                       | Portugal
 Europe                       | Republic of Cyprus
 Europe                       | Romania
 Europe                       | Russia
 Europe                       | Slovakia
 Europe                       | Slovenia
 Europe                       | Soviet Union
 Europe                       | Spain
 Europe                       | Sweden
 Europe                       | Switzerland
 Europe                       | Ukraine
 Europe                       | United Kingdom
 Europe                       | Yugoslavia
 Europe                       | 
 Indian Ocean                 | Africa
 Indian Ocean                 | Australia
 Indian Ocean                 | Eritrea
 Indian Ocean                 | France
 Indian Ocean                 | India
 Indian Ocean                 | Maldives
 Indian Ocean                 | Mauritius
 Indian Ocean                 | 
 no higher geography recorded | 
 North America                | Canada
 North America                | Greenland
 North America                | Mexico
 North America                | United States
 North America                | 
 North Atlantic Ocean         | 
 North Pacific Ocean          | United States
 North Pacific Ocean          | 
 Pacific Ocean                | Commonwealth of the Northern Mariana Islands
 Pacific Ocean                | Ecuador
 Pacific Ocean                | Federated States of Micronesia
 Pacific Ocean                | Fiji
 Pacific Ocean                | France
 Pacific Ocean                | Kiribati
 Pacific Ocean                | Nauru
 Pacific Ocean                | New Zealand
 Pacific Ocean                | Niue
 Pacific Ocean                | Panama
 Pacific Ocean                | Papua New Guinea
 Pacific Ocean                | Republic of Palau
 Pacific Ocean                | Republic of the Marshall Islands
 Pacific Ocean                | Samoa
 Pacific Ocean                | Solomon Islands
 Pacific Ocean                | Tonga
 Pacific Ocean                | Tuvalu
 Pacific Ocean                | United Kingdom
 Pacific Ocean                | United States
 Pacific Ocean                | United States Minor Outlying Islands
 Pacific Ocean                | U.S. Trust Territory of the Pacific
 Pacific Ocean                | Vanuatu
 Pacific Ocean                | 
 South America                | Argentina
 South America                | Bolivia
 South America                | Brazil
 South America                | British Guiana
 South America                | Chile
 South America                | Colombia
 South America                | Ecuador
 South America                | France
 South America                | French Guiana
 South America                | Guiana
 South America                | Guyana
 South America                | Paraguay
 South America                | Peru
 South America                | Suriname
 South America                | United Kingdom
 South America                | Uruguay
 South America                | Venezuela
 South America                | 
 South Atlantic Ocean         | United Kingdom
 South Atlantic Ocean         | 
 Southern Ocean               | 
 South Pacific Ocean          | Australia
 South Pacific Ocean          | Chile
 South Pacific Ocean          | Tasmania
 South Pacific Ocean          | 
 West Indies                  | Antigua and Barbuda
 West Indies                  | Bahamas
 West Indies                  | Barbados
 West Indies                  | Cuba
 West Indies                  | Dominica
 West Indies                  | Dominican Republic
 West Indies                  | France
 West Indies                  | Grenada
 West Indies                  | Haiti
 West Indies                  | Jamaica
 West Indies                  | Netherlands
 West Indies                  | Saint Kitts and Nevis
 West Indies                  | Saint Lucia
 West Indies                  | Saint Vincent and the Grenadines
 West Indies                  | Trinidad and Tobago
 West Indies                  | United Kingdom
 West Indies                  | United States
 West Indies                  | Venezuela
 West Indies                  | 
                              | Singapore
                              | United States
dustymc commented 3 years ago

@Jegelewicz I deleted Americas - you created it, not used, low-hanging fruit and all.

@mkoo I deleted United States, California, Pinnacles National Park - also not used, fairly sure it's close enough to NA.

", Singapore, North West Community Development Council, Singapore" was created by alexandraperkins - also deleted. We should consider treating geography more like all other code tables and limiting access to active AWG members.

If Kazakhstan is in Eurasia, then so should be everything else Eurasian. (Eurasia was created for "Russia, but it's big and the data are flaky" - like Americas, it should be eliminated from that role.)

I played with GBIF a bit, hoping there'd be some consistency we might somehow tap into. I can't find it, but it's possible they have something which would be exposed with a "has no geography issues" 'anti-flag' option.

Screen Shot 2020-08-21 at 6 09 56 AM
Jegelewicz commented 3 years ago

I deleted Americas - you created it, not used, low-hanging fruit and all.

Dang - that must of been back at the beginning? Or I was just in a daze.

I agree with Eurasia - we should move all that to either Europe or Asia. Maybe we could use GBIF as a cue for how to treat Russia (all Asia, all Europe, a little of both?)

We should consider treating geography more like all other code tables and limiting access to active AWG members.

This could bog down a bunch of projects. I think we can manage now by monitoring the code table change emails. However, there are a few things we could do to help keep things from going nuts. @sharpphyl suggested that we make a code table to limit options for "Continent/Ocean" since there are really so few and we specify them in our documentation. I also think we need to have a real discussion about Island Groups, Islands, Quads, and Features. Those things create a lot of chaos and I'm pretty sure we could handle them better. Not saying I have the answer, just that if we do a little noodling together maybe we could create something better.

dustymc commented 3 years ago

A code table is pretty easy. I'm still semi-inclined to go the other way, treat the whole shebang as "authority" and trust the folks we trust with authorities not to muck it up, but I'm up for anything.

Islands

https://github.com/ArctosDB/arctos/issues/1278 still seems vaguely like a potential start to me.

move all that to either Europe or Asia

Russia really is big! Merging those to Eurasia is trivial. Splitting Russia is not. NULL continent may be less-evil than merges (or not, IDK)

GBIF

GBIF seems to have Russia entirely within Europe, which ends at Alaska....

Jegelewicz commented 3 years ago
move all that to either Europe or Asia

Russia really is big! Merging those to Eurasia is trivial. Splitting Russia is not. NULL continent may be less-evil than merges (or not, IDK)

Yeah but who searches for "Eurasia"? And if Europe includes South Georgia & South Sandwich Islands why not go all the way to Alaska?

dustymc commented 3 years ago

who searches for "Eurasia"?

Depends on the data. If Russia spans three continents, maybe nobody. If there's no Europe/Asia options, maybe anyone wanting stuff from Eurasia.

South Georgia

I don't see GBIF being broken as a reason to break Arctos!

sharpphyl commented 3 years ago

Just to keep this conversation complicated, let's be sure to also discuss how we treat water bodies if we separate them out from continents. There is a distinction between Hawaii (in North America) with a water body of the Pacific Ocean. That means the (marine) specimen was found in the water. If there is no water body, it's a terrestrial snail found on Oahu. Right now, we have Hawaii in the Pacific Ocean which implies all our specimens are marine but they are not.

dustymc commented 3 years ago

implies all our specimens are marine

That sort of confounded assumption can be nothing but a recipe for bad inferences.

sharpphyl commented 3 years ago

I'm taking this from https://github.com/VertNet/DwCVocabs

The principles that govern the standardization of waterbody are 1) locations not in water should not include the waterbody, 2) locations in water are expected to provide the waterbody in the original data, 3) the standardized waterbody should to be the most specific waterbody that applies.

tucotuco commented 3 years ago

Hi folks, rather than rehash what I think are the issues with how GBIF interprets continent, I urge you to read the issue I presented to them, as it will explain a lot about why you see what you see in GBIF.

tucotuco commented 3 years ago

implies all our specimens are marine

That sort of confounded assumption can be nothing but a recipe for bad inferences.

Careful everyone. The VertNet principle of best practice suggests how to do it, it does not say that everyone has done it, or that an assumption to that effect is sage or safe.

dustymc commented 3 years ago

how to do it

I think that's our primary question here.

  1. Given a blank slate, what's our geography model look like? (Actually not that radical of an idea - geography is just a foreign key from most of Arctos.)
  2. Given the model we should have and a dot on the map, how do we select appropriate geography?

Second is how aggregators and other not-us users interpret those data. The easy solution to that is to just share a model.

tucotuco commented 3 years ago

To me it needs two parts, the shapes and the thesaurus that connects to it. One could approach geography from the spatio-temporal perspective or from the names perspective. You could do things like:

reverse geocoding: Tell me the standard administrative region names for this point (at this time). Here is an example that uses GADM - https://api.gbif-uat.org/v1/geocode/reverse?lat=48.17156&lng=1.18177.

get preferred name - I wanna search on the name of a place as I know it and let something translate that into the preferred name used in an index so I get everything I am looking for. This would take a combination of something like TGN (http://www.getty.edu/vow/TGNServlet?english=Y&find=Sudamerica&place=&page=1&nation=), which does have web services now, and an index that actually is standardized against the preferred names.

dustymc commented 3 years ago

Hey, that's pretty cool, thanks! I'll add it to my scripts.

Interesting that marineregions.org doesn't seem to have great offshore vocabulary - I'm coming to the idea that there's just no such thing, and trying to fake it (eg, by referring to something dry and far away) only adds to the confusion.

@sharpphyl

https://www.google.com/maps/place/38%C2%B005'11.8%22N+122%C2%B023'41.8%22W/@38.0865572,-122.3972392,16.94z/data=!4m6!3m5!1s0x0:0x0!7e2!8m2!3d38.086621!4d-122.3949554

https://api.gbif-uat.org/v1/geocode/reverse?lat=38.086621&lng=-122.394955

https://www.google.com/maps/place/37%C2%B045'39.9%22N+122%C2%B048'05.6%22W/@37.761062,-122.8030996,16.85z/data=!4m6!3m5!1s0x0:0x0!7e2!8m2!3d37.7610772!4d-122.8015434

https://api.gbif-uat.org/v1/geocode/reverse?lat=37.761077&lng=-122.801543

https://www.google.com/maps/place/37%C2%B022'57.5%22N+123%C2%B025'08.9%22W/@37.382637,-123.4213307,17z/data=!3m1!4b1!4m6!3m5!1s0x0:0x0!7e2!8m2!3d37.3826372!4d-123.419142

https://api.gbif-uat.org/v1/geocode/reverse?lat=37.382637&lng=-123.419142

https://www.google.com/maps/place/29%C2%B031'38.7%22N+138%C2%B031'58.6%22W/@29.527412,-138.5351287,17z/data=!3m1!4b1!4m15!1m8!3m7!1s0x0:0x0!2zMzjCsDA1JzExLjgiTiAxMjLCsDIzJzQxLjgiVw!3b1!7e2!8m2!3d38.086621!4d-122.3949554!3m5!1s0x0:0x0!7e2!8m2!3d29.5274125!4d-138.5329402

https://api.gbif-uat.org/v1/geocode/reverse?lat=29.527412&lng=-138.532940

sharpphyl commented 3 years ago

@dustymc Let's see if I understand the above links. These are reverse geocoding of coordinates moving from a point within the US boundary out into the US Exclusive Economic Zone and beyond into the Pacific Ocean. Would this add the EEZs as part of higher geography and thus tie both to the political entity that controls the EEZ and the ocean it is in? That certainly has promise and I don't immediately see an issue. Would it improve how GBIF interprets our data? I think @mkoo has suggested using EEZs before.

dustymc commented 3 years ago

add the EEZs as part of higher geography

That's a possibility. I was thinking more radically, but I'm not sure how realistic anything is.

If we do something, we'd need to do something consistent. It looks like they end 'continent' right about the golden gate bridge - you OK with that?

The Faralons are part of SF County, adopting enough of this would leave us with a transcontinental county, that doesn't seem ideal.

and thus tie both to the political entity that controls the EEZ and the ocean it is in?

Seems a bit optimistic, but maybe. Would be useful to see their basemap rather than trying to reverse engineer it.

Would it improve how GBIF interprets our data?

It might - presumably they built this for their own use.

Jegelewicz commented 3 years ago
move all that to either Europe or Asia

Russia really is big! Merging those to Eurasia is trivial. Splitting Russia is not. NULL continent may be less-evil than merges (or not, IDK)

There are only 3 HG entries with Eurasia, Russia

campmlc commented 3 years ago

Create an Uber-geog level above continent just for Eurasia?

tucotuco commented 3 years ago

That won't save you from all the other trans-continental country problems. See https://github.com/VertNet/DwCVocabs/issues/56.

On Tue, Sep 1, 2020 at 8:22 PM Mariel Campbell notifications@github.com wrote:

Create an Uber-geog level above continent just for Eurasia?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/1291#issuecomment-685184521, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADQ72ZJH6ZXRJOVQSIQBT3SDV65FANCNFSM4D55CXIQ .

dustymc commented 3 years ago

only 3 HG

I don't understand why that matters. The most precise information we have doesn't fit into the normal "hierarchy" (it's not, because the world isn't).

Jegelewicz commented 3 years ago

We could accept that continent-->country is two different kinds of THINGs and should not be expected to be consistent. This to me looks like the reality we should embrace.

I agree that this is what we should be doing. The only issue arises when we have a locality = "Russia" (or does it? In this case, I would suggest that HG = no higher geography and that "Russia" be included in Specific Locality OR there should be two localities provided one with HG = Asia, Russia and one with HG = Europe, Russia.

Jegelewicz commented 3 years ago

Also, I can figure out the 3 Russia HG in Eurasia and put them on the appropriate continent.

dustymc commented 3 years ago

HG = no higher geography and that "Russia" be included in Specific Locality

I think that's in my "evil" category - it's purposefully "demoting" data to meet our unrealistic expectations.

two localities

That works for search, might not be evil, still seems pretty janky to me.

figure out the 3 Russia HG in Eurasia

That does not seem possible.

One is a country that spans both.

One is a former, bigger, country that spans both.

One has this:

Screen Shot 2020-09-02 at 9 16 53 AM
Jegelewicz commented 3 years ago
HG = no higher geography and that "Russia" be included in Specific Locality

I think that's in my "evil" category - it's purposefully "demoting" data to meet our unrealistic expectations.

I think that using Eurasia is every bit as evil.

two localities

That works for search, might not be evil, still seems pretty janky to me.

Janky, maybe, but it gets the job done (IMO - could be completely wrong).

figure out the 3 Russia HG in Eurasia

That does not seem possible.

One is a country that spans both.

See first comment above. We have "Asia, Russia" and "Europe, Russia". Assign two events with both localities to the records that use "Eurasia, Russia". BTW, I think some of these could have more appropriate HG

image

One is a former, bigger, country that spans both.

Aren't we supposed to be using "current" HG? Some of these could be made better and for the rest "no higher geography" with Soviet Union in the spec loc seems not so evil, since they are just the vague anyway.

image

One has this:

See fix as applied to "Russia". Also, pretty sure these could be sorted onto the correct continent, since they have coordinates...

image

sharpphyl commented 3 years ago

It looks like they end 'continent' right about the golden gate bridge - you OK with that?

It would be nice to have a bit of wiggle room so our coordinates could be 100' off shore and not create an out-of-bounds, but if we had EEZs to work with right off the bridge, it would probably be ok.

This issue has gained a lot of Where's Russia? influence so maybe the rest of this comment belongs elsewhere, but it's related to the question of how to deal with offshore locations.

A consortium of Museums (I don't think any are in Arctos) recently received a grant https://www.nsf.gov/awardsearch/showAward?AWD_ID=2001510&HistoricalAwards=false that is focused on geolocating specimens on the US eastern seaboard. Here is part of their proposal: This project will generate reliable geo-coordinate data for all covered specimen lots using a collaborative georeferencing project in GeoLocate. GeoLocate will add layers for bathymetric data, benthic habitat, and marine conservation areas. Incorporating bathymetry into GeoLocate to determine the extent of locations will also provide that capability for complex elevational data for terrestrial species....The data will be shared through public data repositories, including iDigBio, GBIF, OBIS, and the InvertEBase Symbiota portal.

I asked Dr. José Leal at the National Shell Museum, one of the participants, if, in addition to geolocating specimens more precisely, the project would result in a marine locality structure that could be used by other museums with specimens from similar locations. His reply: Yes, that is the idea. We have Nelson Rios from Geolocate as a PI in the grant, so some of the more technical questions will be resolved by him on this. For marine localities we'll be adding station coordinates (which is nothing new), but still need to resolve how to handle "stations" without coordinates ("off Cape Sable, etc.)

Not sure there's anything in the work they are doing that will be helpful for us, but I thought I'd add it to the stew just in case.

dustymc commented 3 years ago

Assign two events

Taken to extremes, would that require a "France, 1800" record to have about 80 determinations?

supposed to be using "current" HG?

That idea died an agonizing death under the pressure of reality; it's a nice ideal, but it would require a tremendous amount of work every time someone moves a border.

vague anyway

It's less vague than the alternatives.

Eurasia is every bit as evil.

It does not involve discarding data, so I have to disagree. Splitting Sverdlovsk Oblast or San Francisco County across two made-up pigeonholes doesn't seem terribly conducive to discovery, nor does dumping Norway and India into one made-up pigeonhole. I have no idea what we should do, but I do not think it will involve removing precision at any scale.

Jegelewicz commented 2 years ago

Closing as we are not addressing the original issue.