WordPress / openverse-api

The Openverse API allows programmatic access to search for CC-licensed and public domain digital media.
https://api.openverse.engineering/v1
MIT License
77 stars 51 forks source link

Photon not working with WP Photo Directory images #1081

Closed krysal closed 1 year ago

krysal commented 1 year ago

Description

Last two deployment attempts we had to stop and reverse them due to requests to photon for WP Photo Directory images falling with 400 errors. We thought #1080 would solve it but it wasn't the case.

Here’s a log line from the API:

[2023-01-12 16:26:23,256 - urllib3.connectionpool - 456][DEBUG] [464c34ea2f234fe4b8be145c6f28271f] https://i0.wp.com:443 "GET /pd.w.org/2022/08/31162f3e8dd253878.81270200-1536x2048.jpg?w=600&quality=80&ssl=true HTTP/1.1" 400 None

Reproduction

  1. [TBD]
  2. See error.

Environment

This was observed in our staging environment with API versions 2.7.1 and 2.7.2.

zackkrida commented 1 year ago

https://i0.wp.com/pd.w.org/2022/08/31162f3e8dd253878.81270200-1536x2048.jpg?w=600&quality=80&ssl=true

Quite unusual, as visiting these urls directly works great.

AetherUnbound commented 1 year ago

I ssh'd into our currently running dev EC2 instance and used curl to check to make sure that there wasn't an issue with making the requests within our VPC. Both with and without our authentication, the requests completed successfully:

[ec2-user@ip-172-31-82-66 ~]$ curl -I -L 'https://i0.wp.com/pd.w.org/2022/08/31162f3e8dd253878.81270200-1536x2048.jpg?w=600&quality=80&ssl=true'
HTTP/2 200 
server: nginx
date: Thu, 12 Jan 2023 19:45:59 GMT
content-type: image/jpeg
content-length: 95507
last-modified: Thu, 12 Jan 2023 19:40:41 GMT
expires: Sun, 12 Jan 2025 07:40:41 GMT
cache-control: public, max-age=63115200
link: <https://pd.w.org/2022/08/31162f3e8dd253878.81270200-1536x2048.jpg>; rel="canonical"
x-content-type-options: nosniff
etag: "2ca7f98f0cbc7e10"
x-bytes-saved: 4985
vary: Accept
x-nc: HIT dca 3
access-control-allow-origin: *
access-control-allow-methods: GET, HEAD
timing-allow-origin: *

[ec2-user@ip-172-31-82-66 ~]$ curl -H 'X-Photon-Authentication: [redacted]' -I -L 'https://i0.wp.com/pd.w.org/2022/08/31162f3e8dd253878.81270200-1536x2048.jpg?w=600&quality=80&ssl=true'
HTTP/2 200 
server: nginx
date: Thu, 12 Jan 2023 19:46:10 GMT
content-type: image/jpeg
content-length: 95507
last-modified: Thu, 12 Jan 2023 19:40:41 GMT
expires: Sun, 12 Jan 2025 07:40:41 GMT
cache-control: public, max-age=63115200
link: <https://pd.w.org/2022/08/31162f3e8dd253878.81270200-1536x2048.jpg>; rel="canonical"
x-content-type-options: nosniff
etag: "2ca7f98f0cbc7e10"
x-bytes-saved: 4985
vary: Accept
x-nc: HIT dca 3
access-control-allow-origin: *
access-control-allow-methods: GET, HEAD
timing-allow-origin: *

So it seems like we can make these requests just fine, what confuses me is that the logs show we're requesting the ssl=true version and yet it's still failing 😓

AetherUnbound commented 1 year ago

Also just confirmed that trying to use the --http1.1 flag with curl was successful too (I noticed that was a difference between the log line and my earlier tests so I wanted to double check).

AetherUnbound commented 1 year ago

I'm not even able to reproduce this locally 😭 I added the following line to the sample images:

diff --git a/sample_data/sample_images.csv b/sample_data/sample_images.csv
index 79a691a9..bfe7d9e5 100644
--- a/sample_data/sample_images.csv
+++ b/sample_data/sample_images.csv
@@ -4999,3 +4999,4 @@ d4307b61-bd39-44bb-a3cd-aa760029d051,2021-12-15 20:48:57.914727+00,2021-12-15 20
 954a4dce-12b3-4323-86f3-8af7177635d6,2021-12-15 20:48:57.914727+00,2021-12-15 20:48:57.914727+00,provider_api,stocksnap,stocksnap,UCWLBKL1OL,https://stocksnap.io/photo/UCWLBKL1OL,https://cdn.stocksnap.io/img-thumbs/960w/UCWLBKL1OL.jpg,https://cdn.stocksnap.io/img-thumbs/280h/UCWLBKL1OL.jpg,2574,3218,1088013,cc0,1.0,Matt Bango,https://mattbango.photo,Music Guitar Photo,"{""license_url"": ""https://creativecommons.org/publicdomain/zero/1.0/"", ""downloads_raw"": ""5"", ""favorites_raw"": ""0"", ""page_views_raw"": ""225"", ""raw_license_url"": ""https://creativecommons.org/publicdomain/zero/1.0/""}","[{""name"": ""music"", ""provider"": ""stocksnap""}, {""name"": ""guitar"", ""provider"": ""stocksnap""}, {""name"": ""play"", ""provider"": ""stocksnap""}, {""name"": ""string"", ""provider"": ""stocksnap""}, {""name"": ""man"", ""provider"": ""stocksnap""}, {""name"": ""hand"", ""provider"": ""stocksnap""}, {""name"": ""musician"", ""provider"": ""stocksnap""}, {""name"": ""event"", ""provider"": ""stocksnap""}, {""name"": ""playing"", ""provider"": ""stocksnap""}, {""name"": ""instrument"", ""provider"": ""stocksnap""}, {""name"": ""band"", ""provider"": ""stocksnap""}, {""name"": ""electric"", ""provider"": ""stocksnap""}, {""name"": ""live"", ""provider"": ""stocksnap""}, {""name"": ""stage"", ""provider"": ""stocksnap""}, {""name"": ""sitting"", ""provider"": ""stocksnap""}, {""name"": ""person"", ""provider"": ""stocksnap""}, {""name"": ""player"", ""provider"": ""stocksnap""}, {""name"": ""group"", ""provider"": ""stocksnap""}, {""name"": ""performance"", ""provider"": ""stocksnap""}]",f,2021-12-15 20:48:57.914727+00,f,jpg,photograph,0.19767441860465113
 20c931c6-c0d6-4395-a2ee-506ad6d9dd69,2021-12-15 20:48:57.914727+00,2021-12-15 20:48:57.914727+00,provider_api,stocksnap,stocksnap,ZFJEKSUY76,https://stocksnap.io/photo/ZFJEKSUY76,https://cdn.stocksnap.io/img-thumbs/960w/ZFJEKSUY76.jpg,https://cdn.stocksnap.io/img-thumbs/280h/ZFJEKSUY76.jpg,3888,2592,472754,cc0,1.0,Foodie Girl,https://stocksnap.io/author/121423,Fresh Garlic Photo,"{""license_url"": ""https://creativecommons.org/publicdomain/zero/1.0/"", ""downloads_raw"": ""5"", ""favorites_raw"": ""0"", ""page_views_raw"": ""160"", ""raw_license_url"": ""https://creativecommons.org/publicdomain/zero/1.0/""}","[{""name"": ""fresh"", ""provider"": ""stocksnap""}, {""name"": ""garlic"", ""provider"": ""stocksnap""}, {""name"": ""ingredient"", ""provider"": ""stocksnap""}, {""name"": ""vegetable"", ""provider"": ""stocksnap""}, {""name"": ""raw"", ""provider"": ""stocksnap""}, {""name"": ""food"", ""provider"": ""stocksnap""}, {""name"": ""bulb"", ""provider"": ""stocksnap""}, {""name"": ""white"", ""provider"": ""stocksnap""}, {""name"": ""healthy"", ""provider"": ""stocksnap""}, {""name"": ""organic"", ""provider"": ""stocksnap""}, {""name"": ""plant"", ""provider"": ""stocksnap""}, {""name"": ""vegetarian"", ""provider"": ""stocksnap""}, {""name"": ""harvest"", ""provider"": ""stocksnap""}, {""name"": ""head"", ""provider"": ""stocksnap""}, {""name"": ""bunch"", ""provider"": ""stocksnap""}, {""name"": ""clove"", ""provider"": ""stocksnap""}, {""name"": ""green"", ""provider"": ""stocksnap""}, {""name"": ""nature"", ""provider"": ""stocksnap""}, {""name"": ""closeup"", ""provider"": ""stocksnap""}, {""name"": ""market"", ""provider"": ""stocksnap""}]",f,2021-12-15 20:48:57.914727+00,f,jpg,photograph,0.19767441860465113
 230c6951-7f83-4263-a0da-a151f9f67b24,2021-12-15 20:48:57.914727+00,2021-12-15 20:48:57.914727+00,provider_api,stocksnap,stocksnap,NLJESZLWIU,https://stocksnap.io/photo/NLJESZLWIU,https://cdn.stocksnap.io/img-thumbs/960w/NLJESZLWIU.jpg,https://cdn.stocksnap.io/img-thumbs/280h/NLJESZLWIU.jpg,4437,3328,950814,cc0,1.0,The Building Envelope,https://stocksnap.io/author/129953,Concrete Wall Photo,"{""license_url"": ""https://creativecommons.org/publicdomain/zero/1.0/"", ""downloads_raw"": ""5"", ""favorites_raw"": ""0"", ""page_views_raw"": ""210"", ""raw_license_url"": ""https://creativecommons.org/publicdomain/zero/1.0/""}","[{""name"": ""concrete"", ""provider"": ""stocksnap""}, {""name"": ""wall"", ""provider"": ""stocksnap""}, {""name"": ""building"", ""provider"": ""stocksnap""}, {""name"": ""architecture"", ""provider"": ""stocksnap""}, {""name"": ""abstract"", ""provider"": ""stocksnap""}, {""name"": ""background"", ""provider"": ""stocksnap""}, {""name"": ""city"", ""provider"": ""stocksnap""}, {""name"": ""simplicity"", ""provider"": ""stocksnap""}, {""name"": ""dirty"", ""provider"": ""stocksnap""}, {""name"": ""material"", ""provider"": ""stocksnap""}, {""name"": ""monochrome"", ""provider"": ""stocksnap""}, {""name"": ""pillars"", ""provider"": ""stocksnap""}, {""name"": ""pattern"", ""provider"": ""stocksnap""}, {""name"": ""texture"", ""provider"": ""stocksnap""}, {""name"": ""exterior"", ""provider"": ""stocksnap""}, {""name"": ""weathered"", ""provider"": ""stocksnap""}, {""name"": ""rough"", ""provider"": ""stocksnap""}, {""name"": ""surface"", ""provider"": ""stocksnap""}, {""name"": ""weathered"", ""provider"": ""stocksnap""}]",f,2021-12-15 20:48:57.914727+00,f,jpg,photograph,0.19767441860465113
+a8c4afff-5a39-4d4a-9162-d8bb7f696878,2022-10-01 00:40:34.123279+00,2022-11-01 21:28:28.458039+00,provider_api,wordpress,wordpress,69363177f3,https://wordpress.org/photos/photo/69363177f3,https://pd.w.org/2022/09/69363177f3f416735.74988697-1312x2048.jpg,,1312,2048,259234,cc0,1.0,Theo Gkitsos,https://theodorosgkitsos.com,Camp fire with sparks at night,,,f,2022-11-01 21:28:28.458039+00,f,jpg,photography,0.9

Then I ran just recreate and visited http://localhost:50280/v1/images/a8c4afff-5a39-4d4a-9162-d8bb7f696878/thumb/, the thumbnail loaded fine!

openverse-api-web-1  | [2023-01-12 20:12:38,214 - oauthlib.oauth2.rfc6749.endpoints.resource -  70][DEBUG] [59b2960f2e8d4567a0871f99f558b0fd] Dispatching token_type Bearer request to <oauthlib.oauth2.rfc6749.tokens.BearerToken object at 0x7f7a671cff40>.
openverse-api-web-1  | [2023-01-12 20:12:38,225 - urllib3.connectionpool - 1003][DEBUG] [59b2960f2e8d4567a0871f99f558b0fd] Starting new HTTPS connection (1): i0.wp.com:443
openverse-api-web-1  | [2023-01-12 20:12:38,438 - urllib3.connectionpool - 456][DEBUG] [59b2960f2e8d4567a0871f99f558b0fd] https://i0.wp.com:443 "GET /pd.w.org/2022/09/69363177f3f416735.74988697-1312x2048.jpg?w=600&quality=80&ssl=true HTTP/1.1" 200 37160
openverse-api-web-1  | [2023-01-12 20:12:38,511 - catalog.api.utils.photon.get -  80][DEBUG] [59b2960f2e8d4567a0871f99f558b0fd] Image proxy response status: 200, content-type: image/webp
openverse-api-web-1  | [2023-01-12 20:12:38,511 - log_request_id.middleware -  47][INFO] [59b2960f2e8d4567a0871f99f558b0fd] method=GET path=/v1/images/a8c4afff-5a39-4d4a-9162-d8bb7f696878/thumb/ status=200
openverse-api-web-1  | [12/Jan/2023 20:12:38] "GET /v1/images/a8c4afff-5a39-4d4a-9162-d8bb7f696878/thumb/ HTTP/1.1" 200 37160

I think I'm going to try another standalone dev deployment just to play around with this a bit more in the afternoon.

AetherUnbound commented 1 year ago

It looks like the upstream issue with the WordPress photo directory has now been resolved! https://meta.trac.wordpress.org/ticket/6673

I was able to successfully test this in staging, so I'm going to go ahead and close this issue since I was unable to reproduce the root cause.