gbif / portal-feedback

User feedback for the GBIF API, website and published data. You can ask questions here. 🗨❓
30 stars 16 forks source link

what does it mean when we search for country=[some country] in occurrence search? #4880

Open ymgan opened 1 year ago

ymgan commented 1 year ago

Hello,

I got this question from @Antonarctica when we are looking at the occurrences in GBIF map: https://www.gbif.org/occurrence/map?country=AQ&occurrence_status=present

At first we thought that this will filter for all occurrences with decimalLatitude <= -60, but looking at the map it does not seem so. We are a bit confused by whether this filters for occurrences with interpreted, inferred or verbatim values only. Hence the question, what does this filter mean?

Do I understand correctly that if the vebatim values of occurrences fit any of the following criteria, it will appear in the search results when user searches for country=AQ&occurrence_status=present ? Did I miss anything?

decimalLatitude countryCode country flags remarks
<= -60 AQ Antarctica
<= -60 Country derived from coordinates is the lookup based on open street map?
> -60 AQ Country coordinate mismatch
> -60 Antarctica Country coordinate mismatch
> -60 Antarctica Presumed swapped coordinate if decimalLongitude is between -60 and -90, the decimalLatitude and decimalLongitude will be swapped? https://www.gbif.org/occurrence/3674837494
> -60 AQ Presumed swapped coordinate same as above?
between 60 and 90 Antarctica Country coordinate mismatch, Presumed negated latitude
between 60 and 90 AQ Country coordinate mismatch, Presumed negated latitude

Will occurrence without coordinate but with country = Antarctica or countryCode = AQ appear in the occurrence search results table?

Thanks a lot!!

MattBlissett commented 1 year ago

Hi,

This is a good summary, but I have simplified it as there are two steps: validating the countryCode/country, then checking that against coordinates (if provided).

We look at the countryCode and country[Name] first, so "Antarctica" as a countryCode also works, as does "ATA". Various languages are supported, but new translations are only added as we require them. In general forms like "Antarctic Territory", "Antarctic Peninsula", "Western Antarctica" and so on are also supported, if there are enough occurrences using these terms in the country field. For this reason we also have "Adelaide Is", "Brit Antar Ter", "Mer de Ross" and so on. Inconsistencies between countryCode and country (e.g. "GS" and "Antarctica") lead to a Country Mismatch issue.

We are also using strictly less than 60° for Antarctica; there are a few occurrences at exactly 60° which are in South Georgia and the South Sandwich Islands, and a few more (not near those islands) with no country set. I think that's correct by the Antarctic treaty ("The provisions of the present Treaty shall apply to the area south of 60º South Latitude").

decimalLatitude interpreted countryCode flags remarks
< -60 AQ
< -60 Country derived from coordinates The <-60° test is separate and doesn't use a lookup
≥ -60 AQ Country coordinate mismatch If decimalLatitude≥-60 this lookup is used based on Marine Regions with some tweaks, e.g. removing the EEZ of Antarctica, and (by the previous check) ignoring the EEZ of SG&SSI below -60°.
≥ -60 AQ Presumed swapped coordinate if decimalLongitude is between -60 and -90, the decimalLatitude and decimalLongitude will be swapped? https://www.gbif.org/occurrence/3674837494
between 60 and 90 AQ Country coordinate mismatch, Presumed negated latitude

You can exclude the potentially risky transformations/swaps, and other problem records, with hasGeospatialIssue=false, which is a suggested filter when using www.gbif.org with a location search. (Though I've spotted a bug with this; we are investigating.)

Will occurrence without coordinate but with country = Antarctica or countryCode = AQ appear in the occurrence search results table?

Yes, see here for only these records.

(Colleagues: I've assigned this to myself as public documentation should be written.)