WordPress / openverse

Openverse is a search engine for openly-licensed media. This monorepo includes all application code.
https://openverse.org
MIT License
254 stars 204 forks source link

Remove the Watermark API endpoint #4835

Open zackkrida opened 3 months ago

zackkrida commented 3 months ago

Problem

The API currently contains a deprecated endpoint:

https://api.openverse.org/v1/#tag/images/operation/images_watermark

This endpoint only works for some images, and, for example, does not support SVGs.

Furthermore, the image is not accessible and low-quality. The attribution statement is hard-coded into image which has major legibility problems. The frame and the fonts are non-configurable and frankly look outdated.

Additionally, this endpoint sees little to no usage.

Description

Remove this endpoint entirely. This is complicated by our lack of clear API versioning and is blocked until we resolve that problem. #662 will address the problem.

Alternatives

Additional context

sarayourfriend commented 3 months ago

To add another reason to remove it: the watermark endpoint downloads images and processes them in PIL. That exposes us to attacks like the webp vulnerability and other types of pixload type attacks. It's also probably pretty easy to create a denial-of-service attack using that endpoint to cause OOM errors. We do not actively maintain or test that endpoint, and it's easy to forget it exists. That's not a great combination to have.

sarayourfriend commented 3 months ago

@zackkrida can you clarify why you believe our lack of clear versioning policy blocks this issue from being implemented? In theory, the longer we wait to remove it, the more likely it is that someone could actually start to rely on it. As discussed privately, the volume of this endpoint is almost non-existent.

In some sense, a lack of clear versioning policy somewhat frees us to do whatever we want here. Whatever versioning policy and strategy we decide, we should make sure it doesn't tie our hands into operating endpoints that have the quality of this one in perpetuity: underused and potentially high-risk. We would need to have some way for us to give a notice of breaking changes, even for well-used endpoints, including removal. Which would only be available to registered API users, by nature of needing a communication channel. We could assume unregistered API users are not planning long-term, stable integrations.

That's all hypothetical, and meant to demonstrate that even with a clear versioning policy, we would still need and exercise processes for making breaking changes to our API. That would be especially true when considering high-risk scenarios.

Additionally, all public documentation about the endpoint has been clear about its deprecated status for a very long time.

With those three things in mind (non-use, high-risk, and long-time notice of deprecation), I believe we should consider not blocking this on an explicit versioning policy.

zackkrida commented 2 months ago

@sarayourfriend apologies for the delayed reply here, I wrote a reply locally last week and forgot I never shared it.

The TL;DR of my thinking is:

Policies are often revoked and modified by tech companies seemingly at whim. The deprecation of this endpoint is a good opportunity to establish a "precedent of policy conformant-behavior" that will instill trust in our API consumers that the Openverse project does as it says it will.

To expand on that, I was thinking about past policy revocations I've seen like:

Policies always depend on a level of trust. Deprecating an endpoint in a manner differing from our stated policy before we implement the policy doesn't increase trust in us as maintainers.

I do agree that there is high-risk for misuse with the endpoint, but I also believe there are swift and quick mitigations to any attack, like blocking the endpoint via firewall, in a manner that would be justified in the case of an attack.