Open MattBlissett opened 3 years ago
Here is how dwc:waterBody is used in practice. I personally feel like replacing someone's dwc:waterBody term with one from a sea polygon is not so nice, so I would personally lean in the direction of using a custom term like GADM.
waterbody | count |
---|---|
null | 1.88E+09 |
Gulf of Mexico | 7565698 |
North Pacific Ocean | 1693380 |
Pacific Ocean | 1638520 |
north; west; offshore; European; Atlantic; Iris... | 1122886 |
North Atlantic Ocean | 1054138 |
South Pacific Ocean | 921469 |
Northwest Atlantic | 841704 |
Southern Ocean | 797383 |
Northeastern Pacific Ocean | 696623 |
northeast Pacific | 471986 |
Sleipner area | 468729 |
Atlantic Ocean | 387188 |
Statfjord | 384972 |
Trondelag area | 332001 |
Oseberg area | 281674 |
Ebro Basin | 266402 |
Pacific | 249816 |
Ekofisk area | 224578 |
not applicable | 221456 |
North Atlantic,New York Bight | 214476 |
Northwest Atlantic,Atlantic Canada,Maritimes (C... | 209039 |
Oir | 206247 |
Mobile Bay | 195453 |
Mar Caribe | 191757 |
Finnmark | 182944 |
North Atlantic Ocean, Gulf of Mexico | 180812 |
Eastern North Pacific Ocean | 163273 |
Scorff | 162867 |
Southern Pacific | 156350 |
Indian Ocean | 144883 |
Southern Eastern Pacific | 143568 |
Molene Archipelago,Iroise Sea,English Channel,I... | 138877 |
North Atlantic,North Carolina,South Carolina | 132081 |
Global | 129557 |
Barents Sea South | 119859 |
Patos Lagoon estuary and adjacent coast | 102333 |
Alaska EEZ | 101360 |
Caribbean Sea | 99852 |
Atlantic | 97143 |
Roche | 91222 |
Corpus Christi Bay | 90908 |
North Atlantic Ocean, Caribbean Sea | 90882 |
Biscayne Bay | 89678 |
Lake Superior | 89676 |
Florida Bay | 88501 |
Mississippi Sound | 84402 |
Aransas Bay | 76720 |
France,Northeast Atlantic | 75639 |
Arctic Ocean | 72757 |
Zeeschelde | 72741 |
San Antonio Bay | 71543 |
Lower Laguna Madre | 70968 |
Matagorda Bay | 64597 |
Little Lagoon | 61758 |
Lake Ontario | 61200 |
Baie de Somme | 61049 |
Perdido Bay | 58925 |
I will run a test interpretation and see what would happen.
I'm not proposing to change "Lake Ontario", but it would normalize "Caribbean Sea" and "North Atlantic Ocean, Caribbean Sea". The loss would be if "Perdido Bay" just becomes "Gulf of Mexico".
We'll also need to take care of what happens around some islands, like Kwajalein Atoll, Marshall Islands:
Concerning (1) above (layers for marine areas): personally I would weigh the function in searching for "everything marine" over the complementarity with EEZs, but this may be a question best to consult with OBIS on (@albenson-usgs, would you like to weigh in?). Is it more important to define marine areas politically as "international waters" (excl EEZs), or habitat-based?
That leaves the question whether we would have to aim for having a seamless fit between continents and marine areas, or whether we would tolerate an overlap (continent as a sum of countries, incl. EEZ, + marine areas covering everything right up to the coast).
This is going to be tricky. @pieterprovoost is going to provide the best advice here. OBIS uses Marine Regions but as far as I know doesn't change any of the data and doesn't add any information to datasets. There's a separate way to filter the data by areas: https://api.obis.org/#/Area.
Since continents are landmasses, I was intending to use polygons that included only land, without coastal waters or EEZ etc.
This is 99% ready to be the continents, although there are some performance issues around the current polygons I have. (The black country borders can be ignored; it's because the continent polygons have been assembled by joining and splitting country polygons.)
This is IHO,
and this is SeaVoX, though I think we'd exclude the North American / European / Asian "Mainland" polygons they have.
Either IHO or SeaVoX can answer "is_marine" within our usual tolerances for determining country.
The OBIS areas are a combination of full EEZs and intersections between EEZs and IHO (a user may want to filter on US North Pacific for example). We also determine shore distance and hence if records are located on land or not using the OpenStreetMap land polygons as these seem to be more detailed than the typical EEZ / IHO layers.
While we are in here, should we also validate and nullify the altitude field for marine data?
The motivation for this is described here such as this example record.
Possible to add Marineregions same way as GADM is? To use the georeferences to search against the oceans?
A while ago I prepared two layers for the geocoder cache: IHO and SeaVoX. Using them is waiting for a couple of decisions:
1) Which layer
The IHO areas are described here: https://www.marineregions.org/sources.php#iho (quick view: https://raw.githubusercontent.com/gbif/geocode/master/geocode-ws/src/main/resources/org/gbif/geocode/ws/layers/iho.png )
And SeaVoX here: https://www.marineregions.org/sources.php#seavox (quick view: https://raw.githubusercontent.com/gbif/geocode/master/geocode-ws/src/main/resources/org/gbif/geocode/ws/layers/seavox.png )
The IHO areas seem to fit exactly with the EEZ areas, so would be a better fit for complementing a continent layer. But a continent layer could also be calculated as the complement of the SeaVoX areas.
Green dots = political, magenta horizontal / brown diagonal = EEZ/IHO (same polygon), dark brown diagonal = SeaVoX (different polygon). http://ws.gbif.org:9012/ but it's extremely slow (hence not public).
This is a decision for @ahahn-gbif.
2) Which terms
Taking into account the discussion in https://github.com/gbif/parsers/issues/26 , I suggest we populate dwc:continent for all terrestrial occurrences, and set dwc:waterBody to a sea/ocean for marine occurrences. I think this leaves dwc:waterBody as-is for rivers, lakes, ponds etc.
I'll specify the interpretation process in detail later, but it would be something like
A similar process can be used for dwc:continent, but the shapefiles for that aren't ready yet.
Andrea, is using the dwc:waterBody term like this appropriate? I think it's similar to what we do with dwc:country/dwc:countryCode.
The other option is to make a new non-DWC term, and probably just populate it without reference to verbatim fields, as we do for GADM.