Closed obulat closed 1 year ago
Auditing the provider scripts is done. Some scripts have been updated to collect available information.
For the providers that cannot be updated right away, I added comments to the follow up issues for related image dimension issues: https://github.com/WordPress/openverse/issues/1486 https://github.com/WordPress/openverse-catalog/issues/647#issuecomment-1224300654 WordPress/openverse#1484
Current Situation
Currently, a lot of image provider scripts are not collecting the
filetype
andfilesize
information. This information can improve the frontend performance and make Openverse friendlier to providers by not requiring ahead
request for each image that lacks this information.Suggested Improvement
Image DAGs and what data they collect (the list is updated when the PRs are created or merged):
smithsonian
:filetype
➖ ,filesize
➖ The API does not returnfiletype
/filesize
raw_pixel
:filetype
➖ ,filesize
➖ The API does not returnfiletype
/filesize
museum_victoria
:filetype
✅ ,filesize
✅ WordPress/openverse-catalog#600nypl
:filetype
✅ ,filesize
➖ WordPress/openverse-catalog#630 The API does not returnfilesize
phylopic
:filetype
✅ ,filesize
➖ WordPress/openverse-catalog#547 The API does not returnfilesize
science_museum
:filetype
✅ ,filesize
✅ WordPress/openverse-catalog#576smk
:filetype
✅ ,filesize
✅ WordPress/openverse-catalog#542cleveland_museum_of_art
:filetype
✅ ,filesize
✅ WordPress/openverse-catalog#537metropolitan_museum_of_art
:filetype
✅ ,filesize
➖ WordPress/openverse-catalog#568 The API does not returnfilesize
finnish_museums
:filetype
➖ ,filesize
➖ The API does not returnfiletype
/filesize
Temporarily disabled DAGs that will need to be fixed later:
walters_art_museum
:filetype
➖ ,filesize
➖ Reason: no API key, WordPress/openverse#1637brooklyn_museum
:filetype
➖ ,filesize
➖ Reason: no API token, WordPress/openverse#1638europeana
:filetype
➖ ,filesize
➖ Reason: needs to be refactored for v2, WordPress/openverse#1727Scripts that were already collecting
filetype
andfilesize
data:flickr
:filetype
✅ ,filesize
✅ - Uses image GET requeststocksnap
:filetype
✅ ,filesize
✅ Uses image GET request for filesize, and only has JPGwikimedia_commons
:filetype
✅ ,filesize
✅ Has an ad-hoc function to compute filetype, needs to be updated to use the common util onewordpress
:filetype
✅ ,filesize
✅ Has an ad-hoc function to compute filetype, needs to be updated to use the common util oneThen, separately, we'd need to write a script to backfill all existing records. Finally, we would need a solution to collect the
filetype
andfilesize
for images whose provider scripts do not provide the data.