GBIF-Europe / nordic_oikos_2018_r

Scientific reuse of openly published biodiversity information: Programmatic access to and analysis of primary biodiversity information using R. Nordic Oikos 2018, pre-conference R workshop. Venue: 18. feb. 2018 10:00 - 19. feb. 2018 16:00, Trondheim
http://www.gbif.no/events/2018/Nordic-Oikos-2018-R-workshop.html
GNU General Public License v3.0
5 stars 4 forks source link

fix geometry predicate - use within - in occ_download call (?) #2

Closed mskyttner closed 6 years ago

mskyttner commented 6 years ago

This PR is an attempt to fix the occ_download() call issue #1 but I am not sure it is correct. The docs here https://www.gbif.org/developer/occurrence mention the "within" predicate as suitable for coordinates that are inside a POLYGON. Not sure if the equals ( = ) would be for coordinates on a polygon border, then?

In any case the call when made with "geometry within ${wkt}" seems to return the download key and later a data.frame that contains 87 observations (instead of the expected 66). The console log when printing download_key gives:

<<gbif download metadata>>
  Status: PREPARING
  Format: DWCA
  Download key: 0002554-180131172636756
  Created: 2018-02-07T15:52:34.436+0000
  Modified: 2018-02-07T15:52:34.436+0000
  Download link: http://api.gbif.org/v1/occurrence/download/request/0002554-180131172636756.zip
  Total records: 0
  Request: 
    type:  and
    predicates: 
      > type:  or
        predicates: 
          - type: equals, key: TAXON_KEY, value: 2346633
          - type: equals, key: TAXON_KEY, value: 2366645
      > type: equals, key: HAS_COORDINATE, value: TRUE
      > type: equals, key: HAS_GEOSPATIAL_ISSUE, value: FALSE
      > type: equals, key: COUNTRY, value: NO
      > type: within, key: geometry, value: POLYGON((9.33 62.80, 9.33 64.20, 12.13 64.20, 12.13 62.80, 9.33 62.80))
andersfi commented 6 years ago

I'm not sure what is going on here!

The following search in the portal "https://www.gbif.org/occurrence/search?has_coordinate=true&taxon_key=2346633&geometry=POLYGON%20((10.32989501953125%2063.26787016946243,%2010.32989501953125%2063.455051146616825,%2010.8819580078125%2063.455051146616825,%2010.8819580078125%2063.26787016946243,%2010.32989501953125%2063.26787016946243))" returns 38 occurrences.

However, when pasting the polygon into the R script using the "within" predicative this returns 41 occurrences. In this example, I can't' see how any potential differences between the "within" predicate or = should make a difference...

mskyttner commented 6 years ago

When trying this I think that now I get 38 records from the API (using a query.json file), from the http website link and also from this R code:

library(rgbif)

my_wkt <- "POLYGON((10.32989501953125 63.26787016946243, 10.32989501953125 63.455051146616825, 10.8819580078125 63.455051146616825, 10.8819580078125 63.26787016946243, 10.32989501953125 63.26787016946243))"

#wicket::validate_wkt(my_wkt)
geom_param <- paste("geometry", "within", my_wkt)

download_key <- 
  occ_download(
    type = "and",
    'taxonKey = 2346633',
    'hasCoordinate = TRUE',
    geom_param
  ) %>% 
  occ_download_meta

key <- download_key[1]

download.file(
  url = paste0("http://api.gbif.org/v1/occurrence/download/request/", key),
  destfile = paste0(key, ".zip"),
  quiet = FALSE
)

unzip(paste0(key, ".zip"), list = TRUE)
# reports 38 rows
nrow(readr::read_tsv("occurrence.txt"))

The curl call was made with this json:

{
  "creator": "mskyttner",
  "notification_address": [
    "markus.skyttner@nrm.se"
  ],
  "format": "SIMPLE_CSV",
  "predicate": {
    "type": "and",
    "predicates": [
      {
        "type": "equals",
        "key": "HAS_COORDINATE",
        "value": "true"
      },
      {
        "type": "equals",
        "key": "TAXON_KEY",
        "value": "2346633"
      },
      {
        "type": "within",
        "geometry": "POLYGON((10.32989501953125 63.26787016946243, 10.32989501953125 63.455051146616825, 10.8819580078125 63.455051146616825, 10.8819580078125 63.26787016946243, 10.32989501953125 63.26787016946243))"
      }
    ]
  }
}

And then using this curl cmd:

curl --include --user userName:userPass \
  --header "Content-Type: application/json" \
  --data @query.json \
  http://api.gbif.org/v1/occurrence/download/request

# gives 38 records when inspecting results with
curl -Ss http://api.gbif.org/v1/occurrence/download/${identifier} | json_pp
andersfi commented 6 years ago

yes, works now - probably was the "has coordinate issues" that was the differences?