WordPress / openverse

Openverse is a search engine for openly-licensed media. This monorepo includes all application code.
https://openverse.org
MIT License
254 stars 202 forks source link

`load_sample_data` passes locally even if any part of it fails #2359

Open sarayourfriend opened 1 year ago

sarayourfriend commented 1 year ago

Description

2322 removed the load_test_data method from TableIndexer. This causes the calls to the LOAD_TEST_DATA action in the load_sample_data.sh script to fail, however, the script itself continues to run without stopping despite the failure.

Additionally, #2362 shows that other types of failures in this script can cause confusing and difficult to debug local environment issues.

This script is critical to our testing and local environments and there should not be any doubt that the script does exactly what it is meant to do and alerts us obviously and quickly when something doesn't work as intended, whether in a general (as in #2322) or specific case (as in #2362).

Reproduction

  1. Checkout #2322 and run `just down -v`, `just api/up` then `just api/init`. See that `LOAD_TEST_DATA` actions both fail, but the rest of init proceeds.

Fix

Actually check that the commands do what they're intended to do and fail the script if anything goes wrong.

We could probably do to convert load_sample_data.sh to a Python script or a Django management command that can more easily do error checking and have a real unit test.

sarayourfriend commented 1 year ago

This will be fixed in a PR I will put up soon

Never mind, I'm not sure of this issue now but have no way to verify locally due to #2362.

sarayourfriend commented 1 year ago

Downgraded to high after updating the issue to reflect that nothing is actively broken by this, but it is still a grave concern and a big hole in our testing system's reliability if it silently fails at any step we expect to pass.

zackkrida commented 1 year ago

Further downgrading to medium after the improvements in #2479 were added.

Could we potentially use the Airflow CLI in the script to get more accurate information about tasks in a relatively simple way?

sarayourfriend commented 1 year ago

I considered that as well but it would make api/init dependent on Airflow, which it currently is not. It would probably be the most "end-to-end" way to load the sample data indexes though.

zackkrida commented 1 year ago

@dhruvkb this is currently in-progress. Should we move it to the backlog, close it, or are there other active changes I am not aware of?

dhruvkb commented 1 year ago

Moved to the backlog. I opted against closing it right away to give more consideration to the idea of using Airflow. If that is not required, we can close it.