gbif / occurrence

Occurrence store, download, search
Apache License 2.0
22 stars 15 forks source link

Invalid queries in occurrence downloads are silently ignored #273

Closed timrobertson100 closed 9 months ago

timrobertson100 commented 3 years ago

The following download query results in the user being presented with the monthly download, and no download created. It is incorrect in that it lacks a valid predicate:{} around the AND.

It would be better if the user is informed of the mistake. This is a striped down example from a real helpdesk request.

{
  "creator": "SET THIS HERE",
  "notificationAddresses": [
    "SET THIS HERE"
  ],
  "sendNotification": true,
  "format": "SIMPLE_CSV",
  "type": "and",
  "predicates": [
    {
      "type": "equals",
      "key": "COUNTRY",
      "value": "US",
      "matchCase": false
    },
    {
      "type": "equals",
      "key": "OCCURRENCE_STATUS",
      "value": "present",
      "matchCase": false
    }
  ]  
}
timrobertson100 commented 3 years ago

I suspect our JSON parsers are configured to ignore unknown fields. If so, perhaps for POST/PUT requests we should disable that?

MattBlissett commented 1 year ago

I think we need to roll this change back and reimplement it in a much less strict way. At it is, it broke rgbif, pygbif, Ruby (not sure if gbifrb or just a user with Ruby), Nagios and some plain Python and R requests. We've given 2600 HTTP 400 responses in the last few days for 25 users.

This request is also invalid, but it used to be valid:

    "predicates": [
        {
            "type":"not",
            "predicate":
            {
              "type":"isNotNull",
              "parameter":"RECORDED_BY"
            }
        },
        {
            "type": "equals",
            "key": "BASIS_OF_RECORD",
            "value": "PRESERVED_SPECIMEN"
        },
        {
            "type": "equals",
            "key": "COUNTRY",
            "value": "AU"
        },
        {
            "type": "equals",
            "key": "TAXON_KEY",
            "value": 212
        }
    }

We should aim to reject unknown properties, without changing how values are parsed, and with camelCaseInsensitive or snake_case_insensitive properties being accepted. Anything else is a breaking API change. The properties which the user doesn't set (creator, created, status etc) should continue to be accepted but ignored.

MattBlissett commented 10 months ago

This needs more work.

A download request like this:

{
  "data": {
    "creator": "MattBlissett",
    "format": "SPECIES_LIST",
    "predicate": {
      "type": "geoDistance",
      "latitude": "90",
      "longitude": "100",
      "distance": "5km"
    }
  }
}

gave the monthly download.