cc-archive / cccatalog

[PROJECT TRANSFERRED] Mapping the commons towards an open ledger and cc search.
https://github.com/WordPress/openverse-catalog
MIT License
63 stars 60 forks source link

[Bug] Wikimedia Commons Provider API Script breaks for out-of-range date #409

Closed mathemancer closed 4 years ago

mathemancer commented 4 years ago

Bug Description

We sometimes run the script at

src/cc_catalog_airflow/dags/provider_api_scripts/wikimedia_commons.py

for a date containing no data from Wikimedia Commons (for example, a date before which Wikimedia Commons existed).

Usually, this is not a problem, but on occasion the Wikimedia Commons API will return a malformed JSON in their response, and our script is breaking when this happens

To Reproduce

  1. Try to run the wikimedia_commons script for dates in the 1990s. (It may take a few runs for the problem to appear).
  2. Note that the script will break down (on occasion).

Expected behavior

The script should be able to handle the type of malformed JSON that is possible when we request metadata for an out-of-range date.