Open krysal opened 1 year ago
Just noting for this that we'll want to make these values configurable:
Linking this to https://github.com/WordPress/openverse/issues/3336 as they are relevant to each other.
This is blocked on #3336. If that work goes forward, we will remove the filtered index entirely and this work will not be necessary.
I'm also going to remove it from the search relevancy milestone as it should not be a requirement for that project to be resolved.
Problem
Currently, the DAGs for the creation of filtered indexes (for image and audio) depend on the Ingestion Server. There is no reason we can not leave all that work to Airflow and it would be preferable to have fewer moving parts so it's also easier to debug when things go wrong.
Description
Move the
create_and_populate_filtered_index
function out of the Ingestion Server to the create filtered index DAG in the Catalog.https://github.com/WordPress/openverse/blob/41a12720eddcb455fd7ce839eb9ee4c722cf8857/ingestion_server/ingestion_server/indexer.py#L465-L471
Additional context
This will be required down the line for other DAGs in the Search relevancy sandbox project.