Closed Jegelewicz closed 2 years ago
@DerekSikes noticed something similar in GBIF regarding dates.
Darwin Core is an exchange standard; Arctos isn't "complying" with any data standards because none exist.
I agree with your assessment: User's initial reaction to the flag will be "Arctos is broken," which is absolutely not the case.
We've done a bit of thinking about this internally. Right now there are some data quality flags from iDigBio that are useful because they correct objectively incorrect data, like a mismatch between coordinates and country due to a missing sign. Others, like your Pacific islands example, are subjective to how data are stored in Arctos vs. other models. Many of the objective DQ tests would flag errors that we don't have because Arctos/Dusty also catches them (e.g. "April 31st is a date that doesn't exist"). The subjective ones I don't think are worth our time to care about at this point, in particular because the DQ tests and methods iDigBio uses are in flux due to work being done in TDWG.
The TDWG Biodiversity Data Quality task group has a few factions working on different aspects. One is trying to define a framework for what we even mean when we talk about data quality as a collections community. Another is getting all the aggregators, including iDigBio, to settle on a set of the same data quality tests to run on provider data and return flags for.
I don't actually think the flags are visibly negative enough to make users think "Arctos is broken." I would hope (although I guess hope is the operative word here), that people who are running analyses on or otherwise using aggregator data for something beyond browsing would notice that the flags are doing more standardizing than correcting, and that obviously different collections/databases use different but equally correct ways to say the same thing...
I think I know what is going on here now and it would be a change to higher geography. While at SPNHC, Robert Mesibov offered to review some Arctos data for me. He downloaded the MSB fish data from iDigBio and reviewed the RAW file. One of the issues he found was that all of the stuff coming from oceans had no water body and instead the body of water was in the DWC_Continent field.
In Darwin Core, Atlantic Ocean is a body of water, not a continent.
I thought that it would make sense to call the tectonic plate the "continent", but that isn't how iDigBio does it. They use political boundaries for continent.
So DMNS:Bird:18967 in Arctos shows a continent of "Atlantic Ocean" in Arctos and no associated water body.
and DMNS:Bird:18967 in iDigBio shows a continent of "Europe" and has the flag DWC Continent Replaced.
Strictly speaking, we are both wrong but I doubt that anyone searching in iDigBio for Europe wants stuff from the South Georgia Islands. And when I search iDigBio for insitution code "DMNS" plus water body "Atlantic Ocean" I get no results. At least anyone searching Arctos for stuff from the Continent/Ocean field for "Atlantic Ocean" will find this specimen (I tried it and it worked!).
All this being said. It seems to me that there needs to be a wider community discussion about Continent and Bodies of Water but in the interest of making our stuff more searchable in iDigBio (and GBIF I'm betting), I suggest that we add Water Body to higher geography and for anything with a continent that is really a water body we add the correct name to the water body field. iDig Bio will still replace our "Continent/Ocean" information, but the correct water body will get there, so people searching the oceans will find our stuff.
BTW, I added the whole continent/ocean issue to the TDWG data quality GitHub.
This is an aggregator doing something indefensible (which you've explicitly permitted by licensing your data CC0). This isn't an Arctos issue (there is no standard of which I'm aware), and it's not a DWC issue (the data are being properly transported to the aggregators).
There's been a "community discussion" going on for 32 year(this is what TDWG was formed to do) with no resolution. What we NEED is a usable authority. Arctos could become that or plug into something else; both are technically trivial. (What's Kurator using?)
I dislike waterbody. I fail to see how the few miles of sometimes-wet sorta-ditch behind the farm (it's in Getty) is the same sort of data as states and counties.
woops
@ArctosDB/geo-group , please read John W's response.
We could (theoretically - it may push this into 'infrastructure-limited' territory) use a non-DWC vocab and translate. Eg if ya'll really like 'Central America" as a continent then we could push it and North America to 'North and Central America' on export. (Or maybe that's a horrible idea which just ensures that someone finding something in iDigBio can't find it in Arctos and vise-versa.)
if the location itself is not in the water, dwc:waterbody should be left empty, otherwise we end up with some incongruent assertions some day when the semantics become rigorously important.
https://github.com/ArctosDB/arctos/issues/1107 - we regularly violate this principle and seem resistant to stopping that.
Continent: ...suggest The Getty Thesaurus of Geographic Names (TGN) as the source...Oceania...does not include the oceans.
Maybe that's correct and Oceania only refers to the dirt-parts??
dwc:waterbody is a lot more broad than dwc:continent, as it can include everything from a pond to an ocean. Some use it for drainage basin systems
I'd say that's just wrong (and that's why we've added "drainage" and not "waterbody" the the geography table). There's a LOT of stuff in "Cimarron River Drainage" which isn't anywhere near the Cimarron River (or any other water!).
And https://github.com/ArctosDB/arctos/issues/1366 is still unanswered, but I don't think a pond is included within what we generally see as geography. Maybe that's an indication that trying to draw a line between geography and locality is not a useful thing to do.
And I'd like to amend my assertion above: what we NEED is a lookup service which turns shapes into whatever sort of text string anyone might want. (We already have that, but it's not very good, not very structured, and not very exposed - it just supports "any geog" queries, and it does so from points. We also have services to turn strings into coordinates, but that quickly becomes circular - at least sometimes, I'm inclined to support our current model which treats those coordinates as suggestions and relies on a person to accept them as "data.")
See also https://github.com/tdwg/bdq/issues/172
After looking into this - I have to agree that our current "Higher Geography" is misleading in searches.
DMNS:Bird:18967 provides a good example. Its higher geography is: Atlantic Ocean, United Kingdom, South Georgia & South Sandwich Islands, South Georgia Islands, South Georgia
As John W. points out, an island is not part of the ocean (a water body). iDigBio moves this specimen to: Europe, United Kingdom, South Georgia & South Sandwich Islands, South Georgia Islands, South Georgia because the United Kingdom is in Europe.
If we were following the ISO 3166 codes, we would have a higher geography of: AN | GS | SGS | 239 | South Georgia and the South Sandwich Islands (dependent state) |
---|
AN = Antarctica GS = South Georgia and the South Sandwich Islands SGS = South Georgia and the South Sandwich Islands 239 = South Georgia and the South Sandwich Islands
Which makes sense if you are searching by continent or country.
ISO 3166 would be far more stable than Wikipedia and we would stop the madness of finding Magellanic Penguins in the United Kingdom (which most certainly happens in Arctos).
Here's your link - click "requery" on the "show/hide" widget to get a URL. http://arctos.database.museum/SpecimenResults.cfm?scientific_name=Spheniscus%20magellanicus&scientific_name_scope=currentID&scientific_name_match_type=startswith&country=United%20Kingdom
I don't really have a problem with those data - the UK is a political entity, not a place. More on that below...
I dislike ISO codes as they line up with our data; the intent/meaning is drastically different. We record (sometimes...) what was there when the specimen was collected (or georeferenced, or when the label was printed, or ...), ISO codes refer to something else, those don't always have much to do with each other, and we don't have the resources to update our data when something changes. "Yugoslavia" could refer to lots of shapes (https://www.youtube.com/watch?v=Ic5tBXESxl8) while ISO 3166-1:890 is 1) just https://en.wikipedia.org/wiki/Socialist_Federal_Republic_of_Yugoslavia#/media/File:Yugoslavia_1956-1990.svg, and 2) a withdrawn code.
because the United Kingdom is in Europe
One problem is that we (and GBIF, apparently) have a crazy mix of geography and politics in the data, and often no way to tell them apart. The UK is most certainly not (entirely) in Europe, nor does the name have any sort of spatiotemporal stability.
an island is not part of the ocean
That brings up the question of where exactly the island ends and the ocean begins. Mean high tide, the exclusive economic zone (for island nations), some arbitrary point established by some historical event, the place where the collector felt they were no longer close enough to the island to record that, ... ?
I'm not sure there's a One True Method for any of that which involves strings. It's all fairly trivial with georeferences - just ask some service capable of responding with the data you want. Theoretically anyway - hard to say what might happen with this input:
Taxonomy Committee had a brief discussion about this. People searching at VertNet, GBIF and iDigBio will not find some of Arctos records due to mismatches between the Continents we use in Arctos and those they use (apparently a standard set) see https://github.com/tdwg/dwc-qa/issues/128#issuecomment-661161433.
Although it would be a lot of work, I think we need to review all higher geography that uses an ocean as the "continent". As John W. pointed out, Hawaii is not part of the Pacific Ocean (it is not water) and if we are sticking with political divisions for higher geography, then Hawaii should be part of North America. see also https://github.com/ArctosDB/arctos/issues/1291#issuecomment-424778196.
I also think we should consider how our continents map to those used by the aggregators:
Arctos | Aggregators |
---|---|
Africa | Africa |
Americas | |
Antarctica | Antarctica |
Arctic Ocean | |
Asia | Asia |
Atlantic Ocean | |
Australia | Oceania |
Central America | |
Eurasia | |
Europe | Europe |
Indian Ocean | |
North America | North America |
North Atlantic Ocean | |
North Pacific Ocean | |
Pacific Ocean | |
South America | South America |
South Atlantic Ocean | |
Southern Ocean | |
South Pacific Ocean | |
West Indies |
Everything that we have in any of the oceans is likely lost in many searches of aggregators and that could be a lot of things.
Actually, I find our continent/ocean list a bit perplexing...why did we decide to make the West Indies a continent?
The West Indies is a subregion of North America - https://en.wikipedia.org/wiki/West_Indies
How is that any different from "Patagonia"?
giant mess
Yep, we should fix
Hawaii ... North America.
No.
West Indies
Wat?!
Hawaii ... North America.
No.
Political divisions be damned?
Then why are we going with Europe, Iceland and not North Atlantic, Iceland or South America, United Kingdom, Falkland Islands, Falkland Islands instead of South Atlantic Ocean, United Kingdom, Falkland Islands, Falkland Islands.
If we want terrestrial things to be found at the aggregator level, terra firma needs to be associated with a continent. If we don't care that anything on an island in the ocean ever gets discovered at the aggregators, then we should just proceed as usual. @ArctosDB/geo-group
Wow-- continents and oceans Why cant it be just based on plates? Let me see if we can get a meeting @ArctosDB/geo-group on a Thursday 1030-12 opening or we prioritize for our Issues meeting in a few weeks.
I am still looking for good marine polys so part of the same discussion.
On Thu, Aug 20, 2020 at 3:25 PM Teresa Mayfield-Meyer < notifications@github.com> wrote:
Hawaii ... North America.
No.
Political divisions be damned?
Then why are we going with Europe, Iceland and not North Atlantic, Iceland or South America, United Kingdom, Falkland Islands, Falkland Islands instead of South Atlantic Ocean, United Kingdom, Falkland Islands, Falkland Islands.
If we want terrestrial things to be found at the aggregator level, terra firma needs to be associated with a continent. If we don't care that anything on an island in the ocean ever gets discovered at the aggregators, then we should just proceed as usual. @ArctosDB/geo-group https://github.com/orgs/ArctosDB/teams/geo-group
— You are receiving this because you are on a team that was mentioned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/1291#issuecomment-677937551, or unsubscribe https://github.com/notifications/unsubscribe-auth/AATH7UPVOU3GP5MYOBOSFTTSBWPEHANCNFSM4D55CXIQ .
Why cant it be just based on plates?
That's what I said! See https://github.com/ArctosDB/arctos/issues/1291#issuecomment-423747804
Plus - there are a lot more plates than we want to keep track of....
Political divisions
North America is a hunk of dirt.
Hawaii is a hunk of dirt (or parts of it are).
There's a noticeable lack of accessible dirt in between them.
aggregator level
I'm not at all convinced that they have anything figured out. (And that's absolutely not an argument that we have anything figured out either!)
plates
Well Siberia is closer to NA than HI is.....
Europe is a hunk of dirt and Iceland is a hunk of dirt with a noticeable lack of accessible dirt between them.....
Similarly with Greenland and North America, 'cept Denmark (mostly in Europe) apparently still owns that despite Trump's best efforts.
On Thu, Aug 20, 2020 at 7:59 PM Teresa Mayfield-Meyer < notifications@github.com> wrote:
Europe is a hunk of dirt and Iceland is a hunk of dirt with a noticeable lack of accessible dirt between them.....
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/1291#issuecomment-677947357, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADQ724RTSVE72KRJZPVN33SBWTF7ANCNFSM4D55CXIQ .
Greenland and North America
Say it ain't so!
arctosprod@arctos>> select continent_ocean,country from geog_auth_rec group by continent_ocean,country order by continent_ocean,country;
continent_ocean | country
------------------------------+----------------------------------------------
Africa | Algeria
Africa | Angola
Africa | Benin
Africa | Botswana
Africa | Burkina Faso
Africa | Burundi
Africa | Cameroon
Africa | Central African Republic
Africa | Comoros
Africa | Democratic Republic of the Congo
Africa | Djibouti
Africa | Egypt
Africa | Equatorial Guinea
Africa | Eritrea
Africa | Ethiopia
Africa | Gabon
Africa | Gambia
Africa | Ghana
Africa | Guinea
Africa | Guinea-Bissau
Africa | Ivory Coast
Africa | Kenya
Africa | Liberia
Africa | Libya
Africa | Madagascar
Africa | Malacco
Africa | Malawi
Africa | Mali
Africa | Mauritania
Africa | Morocco
Africa | Mozambique
Africa | Namibia
Africa | Niger
Africa | Nigeria
Africa | Republic of the Congo
Africa | Rhodesia
Africa | Rwanda
Africa | Sao Tome and Principe
Africa | Senegal
Africa | Senegambia
Africa | Seychelles
Africa | Sierra Leone
Africa | Somalia
Africa | South Africa
Africa | Spain
Africa | Sudan
Africa | Swaziland
Africa | Tanganyika
Africa | Tanzania
Africa | Togo
Africa | Tunisia
Africa | Uganda
Africa | Western Sahara
Africa | Zaire
Africa | Zambia
Africa | Zimbabwe
Africa |
Americas |
Antarctica | France
Antarctica | New Zealand
Antarctica | United Kingdom
Antarctica |
Arctic Ocean | Canada
Arctic Ocean |
Asia | Afghanistan
Asia | Bahrain
Asia | Bangladesh
Asia | Bhutan
Asia | Borneo
Asia | Brunei
Asia | Cambodia
Asia | China
Asia | Cyprus
Asia | India
Asia | Indonesia
Asia | Iran
Asia | Iraq
Asia | Israel
Asia | Japan
Asia | Jordan
Asia | Korea
Asia | Kuwait
Asia | Kyrgyzstan
Asia | Laos
Asia | Lebanon
Asia | Malaysia
Asia | Mongolia
Asia | Myanmar
Asia | Nepal
Asia | North Korea
Asia | Oman
Asia | Pakistan
Asia | Palestine
Asia | Philippines
Asia | Qatar
Asia | Russia
Asia | Saudi Arabia
Asia | Singapore
Asia | South Korea
Asia | Soviet Union
Asia | Sri Lanka
Asia | Syria
Asia | Taiwan
Asia | Tajikistan
Asia | Thailand
Asia | Turkey
Asia | Turkmenistan
Asia | United Arab Emirates
Asia | Uzbekistan
Asia | Vietnam
Asia | West Bank
Asia | Yemen
Asia |
Atlantic Ocean | Cape Verde
Atlantic Ocean | Italy
Atlantic Ocean | Portugal
Atlantic Ocean | Spain
Atlantic Ocean | United Kingdom
Atlantic Ocean |
Australia | Australia
Central America | Belize
Central America | Costa Rica
Central America | El Salvador
Central America | Guatemala
Central America | Honduras
Central America | Nicaragua
Central America | Panama
Central America |
Eurasia | Kazakhstan
Eurasia | Russia
Eurasia | Soviet Union
Eurasia |
Europe | Abkhazia
Europe | Albania
Europe | Andorra
Europe | Armenia
Europe | Austria
Europe | Azerbaijan
Europe | Belarus
Europe | Belgium
Europe | Bosnia and Herzegovina
Europe | Bulgaria
Europe | Croatia
Europe | Czech Republic
Europe | Denmark
Europe | Estonia
Europe | Finland
Europe | France
Europe | Georgia
Europe | Germany
Europe | Greece
Europe | Holland
Europe | Hungary
Europe | Iceland
Europe | Ireland
Europe | Italy
Europe | Luxembourg
Europe | Macedonia
Europe | Malta
Europe | Moldova
Europe | Monaco
Europe | Montenegro
Europe | Netherlands
Europe | Northern Ireland
Europe | North Macedonia
Europe | Norway
Europe | Poland
Europe | Portugal
Europe | Republic of Cyprus
Europe | Romania
Europe | Russia
Europe | Slovakia
Europe | Slovenia
Europe | Soviet Union
Europe | Spain
Europe | Sweden
Europe | Switzerland
Europe | Ukraine
Europe | United Kingdom
Europe | Yugoslavia
Europe |
Indian Ocean | Africa
Indian Ocean | Australia
Indian Ocean | Eritrea
Indian Ocean | France
Indian Ocean | India
Indian Ocean | Maldives
Indian Ocean | Mauritius
Indian Ocean |
no higher geography recorded |
North America | Canada
North America | Greenland
North America | Mexico
North America | United States
North America |
North Atlantic Ocean |
North Pacific Ocean | United States
North Pacific Ocean |
Pacific Ocean | Commonwealth of the Northern Mariana Islands
Pacific Ocean | Ecuador
Pacific Ocean | Federated States of Micronesia
Pacific Ocean | Fiji
Pacific Ocean | France
Pacific Ocean | Kiribati
Pacific Ocean | Nauru
Pacific Ocean | New Zealand
Pacific Ocean | Niue
Pacific Ocean | Panama
Pacific Ocean | Papua New Guinea
Pacific Ocean | Republic of Palau
Pacific Ocean | Republic of the Marshall Islands
Pacific Ocean | Samoa
Pacific Ocean | Solomon Islands
Pacific Ocean | Tonga
Pacific Ocean | Tuvalu
Pacific Ocean | United Kingdom
Pacific Ocean | United States
Pacific Ocean | United States Minor Outlying Islands
Pacific Ocean | U.S. Trust Territory of the Pacific
Pacific Ocean | Vanuatu
Pacific Ocean |
South America | Argentina
South America | Bolivia
South America | Brazil
South America | British Guiana
South America | Chile
South America | Colombia
South America | Ecuador
South America | France
South America | French Guiana
South America | Guiana
South America | Guyana
South America | Paraguay
South America | Peru
South America | Suriname
South America | United Kingdom
South America | Uruguay
South America | Venezuela
South America |
South Atlantic Ocean | United Kingdom
South Atlantic Ocean |
Southern Ocean |
South Pacific Ocean | Australia
South Pacific Ocean | Chile
South Pacific Ocean | Tasmania
South Pacific Ocean |
West Indies | Antigua and Barbuda
West Indies | Bahamas
West Indies | Barbados
West Indies | Cuba
West Indies | Dominica
West Indies | Dominican Republic
West Indies | France
West Indies | Grenada
West Indies | Haiti
West Indies | Jamaica
West Indies | Netherlands
West Indies | Saint Kitts and Nevis
West Indies | Saint Lucia
West Indies | Saint Vincent and the Grenadines
West Indies | Trinidad and Tobago
West Indies | United Kingdom
West Indies | United States
West Indies | Venezuela
West Indies |
| Singapore
| United States
@Jegelewicz I deleted Americas - you created it, not used, low-hanging fruit and all.
@mkoo I deleted United States, California, Pinnacles National Park - also not used, fairly sure it's close enough to NA.
", Singapore, North West Community Development Council, Singapore" was created by alexandraperkins - also deleted. We should consider treating geography more like all other code tables and limiting access to active AWG members.
If Kazakhstan is in Eurasia, then so should be everything else Eurasian. (Eurasia was created for "Russia, but it's big and the data are flaky" - like Americas, it should be eliminated from that role.)
I played with GBIF a bit, hoping there'd be some consistency we might somehow tap into. I can't find it, but it's possible they have something which would be exposed with a "has no geography issues" 'anti-flag' option.
I deleted Americas - you created it, not used, low-hanging fruit and all.
Dang - that must of been back at the beginning? Or I was just in a daze.
I agree with Eurasia - we should move all that to either Europe or Asia. Maybe we could use GBIF as a cue for how to treat Russia (all Asia, all Europe, a little of both?)
We should consider treating geography more like all other code tables and limiting access to active AWG members.
This could bog down a bunch of projects. I think we can manage now by monitoring the code table change emails. However, there are a few things we could do to help keep things from going nuts. @sharpphyl suggested that we make a code table to limit options for "Continent/Ocean" since there are really so few and we specify them in our documentation. I also think we need to have a real discussion about Island Groups, Islands, Quads, and Features. Those things create a lot of chaos and I'm pretty sure we could handle them better. Not saying I have the answer, just that if we do a little noodling together maybe we could create something better.
A code table is pretty easy. I'm still semi-inclined to go the other way, treat the whole shebang as "authority" and trust the folks we trust with authorities not to muck it up, but I'm up for anything.
Islands
https://github.com/ArctosDB/arctos/issues/1278 still seems vaguely like a potential start to me.
move all that to either Europe or Asia
Russia really is big! Merging those to Eurasia is trivial. Splitting Russia is not. NULL continent may be less-evil than merges (or not, IDK)
GBIF
GBIF seems to have Russia entirely within Europe, which ends at Alaska....
move all that to either Europe or Asia
Russia really is big! Merging those to Eurasia is trivial. Splitting Russia is not. NULL continent may be less-evil than merges (or not, IDK)
Yeah but who searches for "Eurasia"? And if Europe includes South Georgia & South Sandwich Islands why not go all the way to Alaska?
who searches for "Eurasia"?
Depends on the data. If Russia spans three continents, maybe nobody. If there's no Europe/Asia options, maybe anyone wanting stuff from Eurasia.
South Georgia
I don't see GBIF being broken as a reason to break Arctos!
Just to keep this conversation complicated, let's be sure to also discuss how we treat water bodies if we separate them out from continents. There is a distinction between Hawaii (in North America) with a water body of the Pacific Ocean. That means the (marine) specimen was found in the water. If there is no water body, it's a terrestrial snail found on Oahu. Right now, we have Hawaii in the Pacific Ocean which implies all our specimens are marine but they are not.
implies all our specimens are marine
That sort of confounded assumption can be nothing but a recipe for bad inferences.
I'm taking this from https://github.com/VertNet/DwCVocabs
The principles that govern the standardization of waterbody are 1) locations not in water should not include the waterbody, 2) locations in water are expected to provide the waterbody in the original data, 3) the standardized waterbody should to be the most specific waterbody that applies.
Hi folks, rather than rehash what I think are the issues with how GBIF interprets continent, I urge you to read the issue I presented to them, as it will explain a lot about why you see what you see in GBIF.
implies all our specimens are marine
That sort of confounded assumption can be nothing but a recipe for bad inferences.
Careful everyone. The VertNet principle of best practice suggests how to do it, it does not say that everyone has done it, or that an assumption to that effect is sage or safe.
how to do it
I think that's our primary question here.
Second is how aggregators and other not-us users interpret those data. The easy solution to that is to just share a model.
To me it needs two parts, the shapes and the thesaurus that connects to it. One could approach geography from the spatio-temporal perspective or from the names perspective. You could do things like:
reverse geocoding: Tell me the standard administrative region names for this point (at this time). Here is an example that uses GADM - https://api.gbif-uat.org/v1/geocode/reverse?lat=48.17156&lng=1.18177.
get preferred name - I wanna search on the name of a place as I know it and let something translate that into the preferred name used in an index so I get everything I am looking for. This would take a combination of something like TGN (http://www.getty.edu/vow/TGNServlet?english=Y&find=Sudamerica&place=&page=1&nation=), which does have web services now, and an index that actually is standardized against the preferred names.
Hey, that's pretty cool, thanks! I'll add it to my scripts.
Interesting that marineregions.org doesn't seem to have great offshore vocabulary - I'm coming to the idea that there's just no such thing, and trying to fake it (eg, by referring to something dry and far away) only adds to the confusion.
@sharpphyl
https://api.gbif-uat.org/v1/geocode/reverse?lat=38.086621&lng=-122.394955
https://api.gbif-uat.org/v1/geocode/reverse?lat=37.761077&lng=-122.801543
https://api.gbif-uat.org/v1/geocode/reverse?lat=37.382637&lng=-123.419142
https://api.gbif-uat.org/v1/geocode/reverse?lat=29.527412&lng=-138.532940
@dustymc Let's see if I understand the above links. These are reverse geocoding of coordinates moving from a point within the US boundary out into the US Exclusive Economic Zone and beyond into the Pacific Ocean. Would this add the EEZs as part of higher geography and thus tie both to the political entity that controls the EEZ and the ocean it is in? That certainly has promise and I don't immediately see an issue. Would it improve how GBIF interprets our data? I think @mkoo has suggested using EEZs before.
add the EEZs as part of higher geography
That's a possibility. I was thinking more radically, but I'm not sure how realistic anything is.
If we do something, we'd need to do something consistent. It looks like they end 'continent' right about the golden gate bridge - you OK with that?
The Faralons are part of SF County, adopting enough of this would leave us with a transcontinental county, that doesn't seem ideal.
and thus tie both to the political entity that controls the EEZ and the ocean it is in?
Seems a bit optimistic, but maybe. Would be useful to see their basemap rather than trying to reverse engineer it.
Would it improve how GBIF interprets our data?
It might - presumably they built this for their own use.
move all that to either Europe or Asia
Russia really is big! Merging those to Eurasia is trivial. Splitting Russia is not. NULL continent may be less-evil than merges (or not, IDK)
There are only 3 HG entries with Eurasia, Russia
Create an Uber-geog level above continent just for Eurasia?
That won't save you from all the other trans-continental country problems. See https://github.com/VertNet/DwCVocabs/issues/56.
On Tue, Sep 1, 2020 at 8:22 PM Mariel Campbell notifications@github.com wrote:
Create an Uber-geog level above continent just for Eurasia?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/1291#issuecomment-685184521, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADQ72ZJH6ZXRJOVQSIQBT3SDV65FANCNFSM4D55CXIQ .
only 3 HG
I don't understand why that matters. The most precise information we have doesn't fit into the normal "hierarchy" (it's not, because the world isn't).
We could accept that continent-->country is two different kinds of THINGs and should not be expected to be consistent. This to me looks like the reality we should embrace.
We could lose the precision altogether and dump everything into Eurasia. That will toss out some data for "Russia, that's all we know" records, and won't do anything for Hawaii being inconveniently located.
We could do something truly evil - reject records which don't meet our expectations of how the world should have been put together or something.
We could accept that continent-->country is two different kinds of THINGs and should not be expected to be consistent. This to me looks like the reality we should embrace.
I agree that this is what we should be doing. The only issue arises when we have a locality = "Russia" (or does it? In this case, I would suggest that HG = no higher geography and that "Russia" be included in Specific Locality OR there should be two localities provided one with HG = Asia, Russia and one with HG = Europe, Russia.
Also, I can figure out the 3 Russia HG in Eurasia and put them on the appropriate continent.
HG = no higher geography and that "Russia" be included in Specific Locality
I think that's in my "evil" category - it's purposefully "demoting" data to meet our unrealistic expectations.
two localities
That works for search, might not be evil, still seems pretty janky to me.
figure out the 3 Russia HG in Eurasia
That does not seem possible.
One is a country that spans both.
One is a former, bigger, country that spans both.
One has this:
HG = no higher geography and that "Russia" be included in Specific Locality
I think that's in my "evil" category - it's purposefully "demoting" data to meet our unrealistic expectations.
I think that using Eurasia is every bit as evil.
two localities
That works for search, might not be evil, still seems pretty janky to me.
Janky, maybe, but it gets the job done (IMO - could be completely wrong).
figure out the 3 Russia HG in Eurasia
That does not seem possible.
One is a country that spans both.
See first comment above. We have "Asia, Russia" and "Europe, Russia". Assign two events with both localities to the records that use "Eurasia, Russia". BTW, I think some of these could have more appropriate HG
One is a former, bigger, country that spans both.
Aren't we supposed to be using "current" HG? Some of these could be made better and for the rest "no higher geography" with Soviet Union in the spec loc seems not so evil, since they are just the vague anyway.
One has this:
See fix as applied to "Russia". Also, pretty sure these could be sorted onto the correct continent, since they have coordinates...
It looks like they end 'continent' right about the golden gate bridge - you OK with that?
It would be nice to have a bit of wiggle room so our coordinates could be 100' off shore and not create an out-of-bounds, but if we had EEZs to work with right off the bridge, it would probably be ok.
This issue has gained a lot of Where's Russia? influence so maybe the rest of this comment belongs elsewhere, but it's related to the question of how to deal with offshore locations.
A consortium of Museums (I don't think any are in Arctos) recently received a grant https://www.nsf.gov/awardsearch/showAward?AWD_ID=2001510&HistoricalAwards=false that is focused on geolocating specimens on the US eastern seaboard. Here is part of their proposal: This project will generate reliable geo-coordinate data for all covered specimen lots using a collaborative georeferencing project in GeoLocate. GeoLocate will add layers for bathymetric data, benthic habitat, and marine conservation areas. Incorporating bathymetry into GeoLocate to determine the extent of locations will also provide that capability for complex elevational data for terrestrial species....The data will be shared through public data repositories, including iDigBio, GBIF, OBIS, and the InvertEBase Symbiota portal.
I asked Dr. José Leal at the National Shell Museum, one of the participants, if, in addition to geolocating specimens more precisely, the project would result in a marine locality structure that could be used by other museums with specimens from similar locations. His reply: Yes, that is the idea. We have Nelson Rios from Geolocate as a PI in the grant, so some of the more technical questions will be resolved by him on this. For marine localities we'll be adding station coordinates (which is nothing new), but still need to resolve how to handle "stations" without coordinates ("off Cape Sable, etc.)
Not sure there's anything in the work they are doing that will be helpful for us, but I thought I'd add it to the stew just in case.
Assign two events
Taken to extremes, would that require a "France, 1800" record to have about 80 determinations?
supposed to be using "current" HG?
That idea died an agonizing death under the pressure of reality; it's a nice ideal, but it would require a tremendous amount of work every time someone moves a border.
vague anyway
It's less vague than the alternatives.
Eurasia is every bit as evil.
It does not involve discarding data, so I have to disagree. Splitting Sverdlovsk Oblast or San Francisco County across two made-up pigeonholes doesn't seem terribly conducive to discovery, nor does dumping Norway and India into one made-up pigeonhole. I have no idea what we should do, but I do not think it will involve removing precision at any scale.
Closing as we are not addressing the original issue.
As my data was recently ingested by iDigBio, I received a huge list of specimens flagged for various corrections (sigh). I wanted to bring this one to the group to see if we should be paying more attention to Darwin Core, or if it is just something to let iDigBio keep "correcting" for.
Some of my specimens on islands in the Pacific, are flagged by iDigBio with "dwc_continent_replaced | Darwin Core Continent Corrected." one example is here: https://www.idigbio.org/portal/records/89015b8e-d745-430c-b846-8b250b62afcb
Is Arctos not complying with Darwin Core or is this just an artifact of iDigBio? Do we need to do anything about it or do I just need to know that these flags are not a problem? My main concern is that users of iDigBio will view our data as less reliable with flags attached.