WordPress / openverse

Openverse is a search engine for openly-licensed media. This monorepo includes all application code.
https://openverse.org
MIT License
254 stars 204 forks source link

Move filtered index creation totally to Airflow #3240

Open krysal opened 1 year ago

krysal commented 1 year ago

Problem

Currently, the DAGs for the creation of filtered indexes (for image and audio) depend on the Ingestion Server. There is no reason we can not leave all that work to Airflow and it would be preferable to have fewer moving parts so it's also easier to debug when things go wrong.

Description

Move the create_and_populate_filtered_index function out of the Ingestion Server to the create filtered index DAG in the Catalog.

https://github.com/WordPress/openverse/blob/41a12720eddcb455fd7ce839eb9ee4c722cf8857/ingestion_server/ingestion_server/indexer.py#L465-L471

Additional context

This will be required down the line for other DAGs in the Search relevancy sandbox project.

AetherUnbound commented 10 months ago

Just noting for this that we'll want to make these values configurable:

https://github.com/WordPress/openverse/blob/ee77ef5fe752fa5f4f89ca6d9c8f5af7fd4816ba/ingestion_server/ingestion_server/indexer.py#L528-L530

sarayourfriend commented 9 months ago

Linking this to https://github.com/WordPress/openverse/issues/3336 as they are relevant to each other.

stacimc commented 9 months ago

This is blocked on #3336. If that work goes forward, we will remove the filtered index entirely and this work will not be necessary.

I'm also going to remove it from the search relevancy milestone as it should not be a requirement for that project to be resolved.