WordPress / openverse-api

The Openverse API allows programmatic access to search for CC-licensed and public domain digital media.
https://api.openverse.engineering/v1
MIT License
76 stars 50 forks source link

Openverse API is no longer reachable due to Cloudflare DDoS protection #1044

Closed ffraenz closed 1 year ago

ffraenz commented 1 year ago

Description

I am one of the developers currently working on picsome.org, a project by Wikimedia Germany. We deeply integrated the Openverse API into the product. Every request to the API is authenticated with API tokens we obtained during registration.

For a few days now the Openverse API responds with a status code 403 and a payload that looks like a Cloudflare DDoS protection page. This is what it looks like when rendered in a browser:

Screenshot 2022-12-13 at 16 49 52

This page may be caused by a manual configuration change for the api.openverse.engineering Cloudflare proxy or by Cloudflare classifying our traffic as 'suspicious'. However, I think this is not intended, as, by design, there is no way for a server (a bot) to respond to this page. Especially, as we are already sending authenticated requests.

Thank you for looking into this!

Reproduction

Running the following command on both our production and staging server yields the Cloudflare DDoS protection page as described above.

curl https://api.openverse.engineering/v1/images?q=Luxembourg&extension=jpg%2Cpng%2Cgif%2Csvg&page_size=20&page=1

Environment

This problem is not environment specific except for our server IP addresses

zackkrida commented 1 year ago

Hi @ffraenz, we'll look into this for you ASAP. For reference: the last time we made Cloudflare configuration changes was 2022-11-28, do you think this could have started then? That may help narrow down the issue.

zackkrida commented 1 year ago

It looks like requests from the https://picsome.org/ IPs was blocked by a firewall rule change on 2022-11-28. For now I've manually allowlisted your IPs. We were seeing traffic from your ASN, likely not from picsome, which indicated aggressive scraping traffic of our frontend that our servers were struggling to keep up with.

It looks like search on picsome is working once again 😄

I would be surprised if the high traffic we were seeing was from Picsome, however in the event that we see dramatic changes to our server usage after allowlisting you today, I will be in touch here to discuss possible solutions. Is picsome open source software? If so we could look at your implementation of the Openverse API and check for any potential concerns.

AetherUnbound commented 1 year ago

@zackkrida Do you think the change you made something we could capture in the infrastructure repo? 😮

zackkrida commented 1 year ago

@AetherUnbound in general it would be ideal to manage our firewall rules through infra-as-code in that repo instead of the Cloudflare UI. I don't think we have an issue for that yet, just for doing the same with our page caching rules.

ffraenz commented 1 year ago

Hi @zackkrida! Thank you so much for helping us out so swiftly! I can confirm that search is working, again.

AFAIK we're only making authenticated calls to the Openverse API to retrieve image metadata and/or image search results for queries users enter in our search bar or in our license checker tool. We don't do frontend scraping.

A previous release of picsome can be found at https://github.com/wmde/picsome. If there's potential to optimize the traffic between picsome and Openverse we're happy to make appropriate changes.