Open AetherUnbound opened 5 months ago
Whoever does this issue should reach out to Science Museum and let them know about the bug, too: feedback@sciencemuseum.ac.uk from https://collection.sciencemuseumgroup.org.uk/about
@AetherUnbound do you know if we've already done that, by chance?
I have not, I intend to though when I'm next in front of my computer! I had just enough time to fill this issue out and get down the context before I had to step away.
I ran the same range locally and the error no longer occurs, closing!
Reopening as I've encountered this issue while testing #4105.
@stacimc do you mind sharing the query params for the case that's failing currently?
Sure -- just a few minutes into ingestion, unfortunately:
"initial_query_params":{
"date[from]":1500
"date[to]":1750
"has_image":1
"image_license":"CC"
"page[number]":43
"page[size]":100
}
Thanks! I've emailed the folks at the Science Museum Group with this information.
Thanks @AetherUnbound. For the time being I've updated the SKIPPED_INGESTION_ERRORS
configuration to skip batches with this particular error, and restarted the DAG.
Airflow log link
https://airflow.openverse.engineering/log?execution_date=2024-03-01T00%3A00%3A00%2B00%3A00&task_id=ingest_data.pull_image_data&dag_id=science_museum_workflow&map_index=-1
Description
It appears as though the Science Museum DAG is failing for this particular URL (specifically, these parameters):
https://collection.sciencemuseumgroup.org.uk/search/?has_image=1&image_license=CC&page[size]=100&page[number]=39&date[from]=0&date[to]=200
Reproduction
Changing the
page[number]
param from 39 to 40 returns a non-503 response:https://collection.sciencemuseumgroup.org.uk/search/?has_image=1&image_license=CC&page[size]=100&page[number]=40&date[from]=0&date[to]=200
Since this is entirely an upstream bug, I think the best case here might be to skip a particular page if we receive a 503 response specifically.
DAG status
No change, this is a monthly DAG and we should hopefully address it soon.