WordPress / openverse

Openverse is a search engine for openly-licensed media. This monorepo includes all application code.
https://openverse.org
MIT License
239 stars 190 forks source link

Backfill image dimensions data #1485

Open stacimc opened 2 years ago

stacimc commented 2 years ago

Problem

Depends on WordPress/openverse#1486

Once we've added image dimensions detection for the providers that don't currently support them, we'll need to backfill the data for previously ingested records. The providers to backfill are:

* Since Metropolitan and Europeana are dated DAGs, we could potentially rely on their reingestion workflow to backfill the data over time (related: WordPress/openverse#1501).

Implementation

obulat commented 2 years ago

File size and file type can be backfilled together with the image dimensions data. Here's the information on the file size and file type information:

Provider file type in the script file size in the script backfill for file type backfill for file size
Smithsonian needs to be added needs to be added - -
Raw Pixel needs to be added needs to be added - -
Finnish Museums needs to be added needs to be added - -
NYPL added in WordPress/openverse-catalog#630 needs to be added not run yet -
Phylopic added in WordPress/openverse-catalog#547 needs to be added not run yet -
Metropolitan Museum of Art added in WordPress/openverse-catalog#568 needs to be added not run yet -
Cleveland Museum of Art added in WordPress/openverse-catalog#537 added in WordPress/openverse-catalog#537 not run yet not run yet
Museums Victoria added in WordPress/openverse-catalog#600 needs to be added not run yet -
SMK added in WordPress/openverse-catalog#542 added in WordPress/openverse-catalog#542 not run yet not run yet
Science Museum added in WordPress/openverse-catalog#576 needs to be added not run yet -
Walters Art Museum cannot fix due to WordPress/openverse#1637 - - -
Brooklyn Museum cannot fix due to WordPress/openverse#1638 - - -
Europeana needs to be added in fixing WordPress/openverse#1727 - - -