WordPress / openverse

Openverse is a search engine for openly-licensed media. This monorepo includes all application code.
https://openverse.org
MIT License
256 stars 204 forks source link

Invalid cursor values for Europeana #3745

Open stacimc opened 10 months ago

stacimc commented 10 months ago

Description

Identified in a production Europeana run. It seems like some Europeana cursor values are not being encoded properly, resulting in a 400.

Reproduction

Run the Europeana DAG with the following for initial_query_params (but replace the value for wskey with the correct api key):

{"wskey": "***", "profile": "rich", "reusability": ["open", "restricted"], "sort": ["europeana_id+desc", "timestamp_created+desc"], "rows": "100", "media": "true", "start": 1, "qf": ["TYPE:IMAGE", "provider_aggregation_edm_isShownBy:*"], "query": "timestamp_update:[2024-01-25T00:00:00Z TO 2024-01-26T00:00:00Z]", "cursor": "AoIvLzEwMjgvRTAwMjc3MjQyc/yK+Z+NAw=\u001d"}

The DAG will fail immediately.

The cursor is AoIvLzEwMjgvRTAwMjc3MjQyc/yK+Z+NAw=\u001d. It is url encoded, resulting in the URL that is requested by the DAG: https://api.europeana.eu/record/v2/search.json?wskey=***&profile=rich&reusability=open&reusability=restricted&sort=europeana_id%2Bdesc&sort=timestamp_created%2Bdesc&rows=100&media=true&start=1&qf=TYPE%3AIMAGE&qf=provider_aggregation_edm_isShownBy%3A%2A&query=timestamp_update%3A%5B2024-01-25T00%3A00%3A00Z+TO+2024-01-26T00%3A00%3A00Z%5D&cursor=AoIvLzEwMjgvRTAwMjc3MjQyc%2FyK%2BZ%2BNAw%3D%1D

The full API response is:

{
  "apikey":"***",
  "success":false,
  "error":"Invalid cursor value. Please make sure you encode the cursor value before sending it to the API.",
  "message":"Please make sure you encode the cursor value before sending it to the API.",
  "code":"400-SC"
}
Hobbesball commented 9 months ago

Hi there! Thanks for flagging this, I've passed this on to our API staff, who have looked into if this could be an issue on Europeana's end. They wanted me to pass on that the cursor values that our API generates are created by Solr. Solr uses Base64 encoding to generate these cursor values. This means that the following characters can appear in the cursor value: "the upper- and lower-case Roman alphabet characters (A–Z, a–z), the numerals (0–9), and the "+" and "/" symbols, with the "=" symbol as a special suffix code."

We therefore believe that the \u001d character should not be able to be generated by our API cursor generator. I hope this helps narrowing down your issue!

Best

Jolan