Tests make outbound network requests to providers for link validation and thumbnail generation

sarayourfriend commented 2 years ago

Description

If the upstream provider (in the case of our current test configuration, Flickr) returns an error, then our tests will fail regardless of whether anything actually changed to make them fail. Right now Flickr is returning 503 and this causes the tests to fail.

Reproduction

Mock the `url` value of an image that the thumbnail is requested with `https://httpbin.org/status/503` and see the test fail. You could also use a firewall like OpenSnitch or LittleSnitch to block the request on a computer level and see the requests happen during the test and fail them.

Resolution

[ ] 🙋 I would be interested in resolving this bug.

sarayourfriend commented 2 years ago

Re-prioritizing this to work on it now because it is causing CI to fail due to inconsistencies in the upstream provider behaviours (which we should not depend on anyway).

sarayourfriend commented 2 years ago

@WordPress/openverse-maintainers I'd like to propose the following solution for this:

Add minio to the local docker-compose and add a couple sample images and audio files from it
Change all the sample data urls to point to the local minio version

Problem with this is that it will make running the app locally outside of docker even more difficult. I think this is fine though, as it's not an approach we recommend or support anyway.

dhruvkb commented 2 years ago

@sarayourfriend I support this solution. Minio is great and while we do somewhat support running the API outside Docker, we still need the scaffolding services to be provisioned by Docker Compose so adding additional services should be OK.

A solution with just mocked responses via not-self-hosted HTTPbin http://httpbin.org/status/{{status}} could be simpler and easier so maybe we should try that first?

sarayourfriend commented 2 years ago

The issue is I think we need actual responses for the url because it's also used for the thumbnails :thinking:

AetherUnbound commented 2 years ago

I love the idea of employing minio here! And agreed, I think requiring that the API run inside of Docker is fine in my mind (we're already doing that for the Catalog anyway).

sarayourfriend commented 2 years ago

I am actually sceptical of the current approach and would like to investigate it more. I can see the value in the integration style tests, but given the infrastructural difficulties and the absolute need to define hard boundaries in some areas (like preventing all outbound network requests, maybe through something like HTTPretty or the like) I wonder if it's sufficient to move our sample data into a Python script that uses the same factories that the tests use. This has two benefits in my book:

It forces the factories to be up to date and valid and reflect the actual models correctly
It makes the test data very clear and easy to understand and maintain

The sample data is currently a chore to create and maintain. In fact we have no documentation on how to do it. I wrote a Python script that creates sample data but it's pretty hacky in some ways and I wasn't able to get the indexing to work at all for some reason. I'll push my branch for that tomorrow, I suspect there's something silly I'm missing that someone else will notice quickly.

There's a handful of general "test improvement" issues. I wonder if we could group them together and ask someone to write an RFC that takes a step back and carefully considers how unit and integration testing should work in this repository. Things are getting more complex than the previously sufficient integration tests are able to account for and it feels like if we don't have a guiding principle here we're going to end up making things even more complicated (by adding even more local infrastructure like minio, for example).

sarayourfriend commented 2 years ago

Unassigning myself from this issue because I won't have time to work on it before I go on a break for three weeks.

The PR linked to this issue at least shows one way of doing it for link validation: pre-caching validation status in Redis. Perhaps this is something that can be done in conftest with a fixture for manipulating the validation status for specific links... but that won't solve it for fixtures created during test runs. Maybe the factories need to have an option on their create function to specify a cache validation status but by default they do push a cached status into.

Anyway, I won't have time to work on this and it's something that is probably better served by being investigated as part of the RFC request I put in yesterday, also linked above.

WordPress / openverse