gbif / portal-feedback

User feedback for the GBIF API, website and published data. You can ask questions here. 🗨❓
28 stars 16 forks source link

How to get marine datasets only ? #5344

Open Kydae opened 4 weeks ago

Kydae commented 4 weeks ago

Hello,

I'm trying to get all the marine datasets of the European waters with the API but I couldn't find how to do it. I tried to look for specific field that will indicate if the dataset is marine, fresh water or terrestrial in the result I got from the API but I didn't find anything relevant. I also tried to use the geometry parameter with the occurrence search (+facet on datasetKey) but the WKT for the European waters is too huge for the API.

Did I miss a relevant parameter/field that could help filtering the marine datasets? Do you know an easy solution I didn't think of?

Kind regards

jhnwllr commented 3 weeks ago

@Kydae Currently there are not easy solutions for doing marine region downloads. If you have a large WKT polygon, you could try to split it up into multiple downloads or reduce the size of the polygon through buffering or something similar.

Keep in mind that many occurrences mediated by GBIF have somewhat large coordinate uncertainty attached to them, so a too fined-grained polygon might inadvertently exclude occurrences. Similarly there might be terrestrial occurrences that end up in the water because of high uncertainty, so you will need to probably do some clean up regardless.

Kydae commented 3 weeks ago

Ok thank you for your time and answer !

MortenHofft commented 3 weeks ago

It isn't ideal, but perhaps something like a negated gadm filter in combination with some appropriate bounding box could be used as a starting point?

https://hp-theme.gbif-staging.org/occurrence/search?filter=eyJtdXN0Ijp7Imdlb21ldHJ5IjpbIlBPTFlHT04oKC01LjA4NTMgNjkuMzAwMzgsLTE3Ljg2Nzg5IDM3LjgzOTY2LC01LjM3MDk0IDI5LjM1OTU5LDI0LjM4MDA0IDI5LjAzNDEyLDQyLjEyODQ1IDMwLjQ4ODQzLDUwLjI1MTQ2IDU1LjUyMjE2LDQwLjgyNTIgNzEuMjE3NSwxOC43NTc3OCA4My44Njk2MywtNS4wODUzIDY5LjMwMDM4KSkiXX0sIm11c3Rfbm90Ijp7ImdhZG1HaWQiOlt7InR5cGUiOiJpc05vdE51bGwifV19fQ%3D%3D&view=MAP

so something like

{
  "type": "and",
  "predicates": [
    {
      "type": "within",
      "geometry": "POLYGON((-5.0853 69.30038,-17.86789 37.83966,-5.37094 29.35959,24.38004 29.03412,42.12845 30.48843,50.25146 55.52216,40.8252 71.2175,18.75778 83.86963,-5.0853 69.30038))"
    },
    {
      "type": "not",
      "predicate": {
        "type": "isNotNull",
        "parameter": "GADM_GID"
      }
    }
  ]
}
MattBlissett commented 3 weeks ago

A negated continent filter is another option, the continent filter includes only land.

Kydae commented 3 weeks ago

How can I do a negated continent filter with the API ?

ManonGros commented 3 weeks ago

@Kydae you have to use the download API https://techdocs.gbif.org/en/data-use/api-downloads, I think Matt is suggesting to use isNull with the continent filter. This recorded webinar could help if you are staring with the download API.

Kydae commented 3 weeks ago

Thank you all for your help and quick answers ! I have created this json file for my request to the download API, and I have this error message. Could you help me ? I'm using the pygbif library to request the API.

error message:

image

json request file : { "predicate": { "type": "and", "predicates": [ { "type": "within", "geometry": polygonWKT }, { "type": "isNull", "parameter": "CONTINENT" }, { "type": "equals", "key": "FORMAT", "value": "DWCA" }, { "type": "equals", "key": "FACET", "value": "DATASET_KEY" }, { "type": "equals", "key": "LIMIT", "value": "0" } ] } }

Kydae commented 3 weeks ago

Hello ! I solved my issue and successfully launched my json request but when I was looking at the results I got, I noticed something weird. I have occurrences that shouldn't be there in my results, and I don't understand why. Here is the DOI to my request : https://doi.org/10.15468/dl.357kcz. It's a request to get only marine datasets in the carribean sea. If you look at the rows that have this datasetKey for example : '9781ce05-2379-4a10-ab3b-c91346510df8', there are some points that are not within the POLYGON I gave in my request and that have no 'coordinateUncertaintyInMeters'. How is this possible ? And is there a way to fix this ?

ManonGros commented 3 weeks ago

Hi @Kydae not all the data providers share a value for the coordinate uncertainty field. You might have to do quite a bit of manual processing before you can use the data. Some good resources include:

I hope it helps!

Kydae commented 3 weeks ago

Oh ok I understand, there is a value for 'coordinateUncertaintyInMeters' but I can't see it because the dataprovider doesn't want to share it, am I right ? Is it possible to exclude the 'coordinateUncertaintyInMeters' field from my search and only take occurrences that have coordinate('decimalLongitude and decimalLatitude) within the POLYGON I gave in my json request ?

MattBlissett commented 3 weeks ago

(I'm looking, I think it's a problem with a clockwise polygon but I will write in a few minutes after checking.)

MattBlissett commented 3 weeks ago

It's a special case (a rectangle) of this bug, which isn't fixed as I thought: https://github.com/gbif/occurrence/issues/340 — see that issue for the explanation, although the invalid rectangle gives even stranger results; the correct latitude bounds but the opposite longitude bounds.

If you use the polygon POLYGON((-92.09816 8.23324,-58.51474 8.23324,-58.51474 23.24135,-92.09816 23.24135,-92.09816 8.23324)) you should get the data you want.

Kydae commented 3 weeks ago

Thank you all for your quick answer ! I think with all of that my issues should be fixed. Have a nice week end !