Open krysal opened 2 years ago
I appreciate that we have an existing mechanism for this! We should probably also have a database table within the catalog that these could be added to, so we don't need to make a code change in order to add new tags that we find which we consider worth denylisting. (Also we should consider changing this from "blacklist" to "denylist" or a similar alternative).
We should probably also have a database table within the catalog that these could be added to, so we don't need to make a code change in order to add new tags that we find which we consider worth denylisting.
Great minds think alike! 😄 That was my alternative suggestion, do you know if this can be done somehow through Airflow UI? I thought of Django but it's not connected to the catalog DB (I think so , though maybe it can be ❔)
(Also we should consider changing this from "blacklist" to "denylist" or a similar alternative).
Totally 💯 "blacklist" was a bit of a weird name the first time I saw it. Changed the title to reflect that.
Great minds think alike! smile That was my alternative suggestion, do you know if this can be done somehow through Airflow UI? I thought of Django but it's not connected to the catalog DB (I think so , though maybe it can be grey_question)
Oops, got so excited to respond I didn't read through the entire issue, sorry! 😂 🤦🏼♀️
This would be something we could manage through the Airflow UI, but if that were the case it'd probably be a Variable unless we wanted to add some custom frontend pieces. The trouble with using a Variable is that it's all stored in 1 database field so if it was only there that record could become quite unwieldy.
What we could do though is us a Variable to allow for updating the table 😮 we could have it be a list, and then trigger an unscheduled DAG which would take the list within that Variable, add new records to a table which the ingestion process can use, then clear out the Variable for the next run! What do you think @stacimc?
The renaming to TAG_DENYLIST
was completed in #4014, so the name of the issue is a bit confusing now.
@krysal, do you think it's best to update this issue's title to reflect that the issue now refers to creating a mechanism for updating the denylist as described in the quote below? Or is it better to close this issue as partly-completed, and open a new one for what's left?
This would be something we could manage through the Airflow UI, but if that were the case it'd probably be a Variable unless we wanted to add some custom frontend pieces. The trouble with using a Variable is that it's all stored in 1 database field so if it was only there that record could become quite unwieldy.
What we could do though is us a Variable to allow for updating the table 😮 we could have it be a list, and then trigger an unscheduled DAG which would take the list within that Variable, add new records to a table which the ingestion process can use, then clear out the Variable for the next run! What do you think @stacimc?
Problem
Many tags from this featured image do not represent useful information.
Description
Add these tags to the
TAG_BLACKLIST
and the reingestion process should clean them from the existing images the next time it runs.https://github.com/WordPress/openverse-catalog/blob/c1b970b11dbf63a45a8c2d6a33d6058a662b7ac9/openverse_catalog/dags/common/storage/media.py#L19-L34
Alternatives
Consider adding a UI and a database table to add tags in the future without modifying the code.
Implementation