gbif / pipelines

Pipelines for data processing (GBIF and LivingAtlases)
Apache License 2.0
40 stars 28 forks source link

Allow filtering by sea/ocean/marine region #612

Open MattBlissett opened 3 years ago

MattBlissett commented 3 years ago

A while ago I prepared two layers for the geocoder cache: IHO and SeaVoX. Using them is waiting for a couple of decisions:

1) Which layer

The IHO areas are described here: https://www.marineregions.org/sources.php#iho (quick view: https://raw.githubusercontent.com/gbif/geocode/master/geocode-ws/src/main/resources/org/gbif/geocode/ws/layers/iho.png )

And SeaVoX here: https://www.marineregions.org/sources.php#seavox (quick view: https://raw.githubusercontent.com/gbif/geocode/master/geocode-ws/src/main/resources/org/gbif/geocode/ws/layers/seavox.png )

The IHO areas seem to fit exactly with the EEZ areas, so would be a better fit for complementing a continent layer. But a continent layer could also be calculated as the complement of the SeaVoX areas.

image

Green dots = political, magenta horizontal / brown diagonal = EEZ/IHO (same polygon), dark brown diagonal = SeaVoX (different polygon). http://ws.gbif.org:9012/ but it's extremely slow (hence not public).

This is a decision for @ahahn-gbif.

2) Which terms

Taking into account the discussion in https://github.com/gbif/parsers/issues/26 , I suggest we populate dwc:continent for all terrestrial occurrences, and set dwc:waterBody to a sea/ocean for marine occurrences. I think this leaves dwc:waterBody as-is for rivers, lakes, ponds etc.

I'll specify the interpretation process in detail later, but it would be something like

A similar process can be used for dwc:continent, but the shapefiles for that aren't ready yet.

Andrea, is using the dwc:waterBody term like this appropriate? I think it's similar to what we do with dwc:country/dwc:countryCode.

The other option is to make a new non-DWC term, and probably just populate it without reference to verbatim fields, as we do for GADM.

jhnwllr commented 3 years ago

Here is how dwc:waterBody is used in practice. I personally feel like replacing someone's dwc:waterBody term with one from a sea polygon is not so nice, so I would personally lean in the direction of using a custom term like GADM.

waterbody count
null 1.88E+09
Gulf of Mexico 7565698
North Pacific Ocean 1693380
Pacific Ocean 1638520
north; west; offshore; European; Atlantic; Iris... 1122886
North Atlantic Ocean 1054138
South Pacific Ocean 921469
Northwest Atlantic 841704
Southern Ocean 797383
Northeastern Pacific Ocean 696623
northeast Pacific 471986
Sleipner area 468729
Atlantic Ocean 387188
Statfjord 384972
Trondelag area 332001
Oseberg area 281674
Ebro Basin 266402
Pacific 249816
Ekofisk area 224578
not applicable 221456
North Atlantic,New York Bight 214476
Northwest Atlantic,Atlantic Canada,Maritimes (C... 209039
Oir 206247
Mobile Bay 195453
Mar Caribe 191757
Finnmark 182944
North Atlantic Ocean, Gulf of Mexico 180812
Eastern North Pacific Ocean 163273
Scorff 162867
Southern Pacific 156350
Indian Ocean 144883
Southern Eastern Pacific 143568
Molene Archipelago,Iroise Sea,English Channel,I... 138877
North Atlantic,North Carolina,South Carolina 132081
Global 129557
Barents Sea South 119859
Patos Lagoon estuary and adjacent coast 102333
Alaska EEZ 101360
Caribbean Sea 99852
Atlantic 97143
Roche 91222
Corpus Christi Bay 90908
North Atlantic Ocean, Caribbean Sea 90882
Biscayne Bay 89678
Lake Superior 89676
Florida Bay 88501
Mississippi Sound 84402
Aransas Bay 76720
France,Northeast Atlantic 75639
Arctic Ocean 72757
Zeeschelde 72741
San Antonio Bay 71543
Lower Laguna Madre 70968
Matagorda Bay 64597
Little Lagoon 61758
Lake Ontario 61200
Baie de Somme 61049
Perdido Bay 58925
MattBlissett commented 3 years ago

I will run a test interpretation and see what would happen.

I'm not proposing to change "Lake Ontario", but it would normalize "Caribbean Sea" and "North Atlantic Ocean, Caribbean Sea". The loss would be if "Perdido Bay" just becomes "Gulf of Mexico".

MattBlissett commented 3 years ago

We'll also need to take care of what happens around some islands, like Kwajalein Atoll, Marshall Islands:

image

ahahn-gbif commented 3 years ago

Concerning (1) above (layers for marine areas): personally I would weigh the function in searching for "everything marine" over the complementarity with EEZs, but this may be a question best to consult with OBIS on (@albenson-usgs, would you like to weigh in?). Is it more important to define marine areas politically as "international waters" (excl EEZs), or habitat-based?

That leaves the question whether we would have to aim for having a seamless fit between continents and marine areas, or whether we would tolerate an overlap (continent as a sum of countries, incl. EEZ, + marine areas covering everything right up to the coast).

albenson-usgs commented 3 years ago

This is going to be tricky. @pieterprovoost is going to provide the best advice here. OBIS uses Marine Regions but as far as I know doesn't change any of the data and doesn't add any information to datasets. There's a separate way to filter the data by areas: https://api.obis.org/#/Area.

MattBlissett commented 3 years ago

Since continents are landmasses, I was intending to use polygons that included only land, without coastal waters or EEZ etc.

This This is 99% ready to be the continents, although there are some performance issues around the current polygons I have. (The black country borders can be ignored; it's because the continent polygons have been assembled by joining and splitting country polygons.)

IHO This is IHO,

SeaVoX and this is SeaVoX, though I think we'd exclude the North American / European / Asian "Mainland" polygons they have.

Either IHO or SeaVoX can answer "is_marine" within our usual tolerances for determining country.

pieterprovoost commented 3 years ago

The OBIS areas are a combination of full EEZs and intersections between EEZs and IHO (a user may want to filter on US North Pacific for example). We also determine shore distance and hence if records are located on land or not using the OpenStreetMap land polygons as these seem to be more detailed than the typical EEZ / IHO layers.

timrobertson100 commented 3 years ago

While we are in here, should we also validate and nullify the altitude field for marine data?

The motivation for this is described here such as this example record.

dagendresen commented 8 months ago

Possible to add Marineregions same way as GADM is? To use the georeferences to search against the oceans?

Screenshot 2024-02-08 at 16 54 18

See also https://github.com/gbif/pipelines/issues/524