Open AetherUnbound opened 3 months ago
The upsert strategy is in the Python Column column of the table, because that is where we save this information:
Do you think it would be clearer to move the upsert strategy somewhere else? I do agree with the idea of adding explanations for the strategies in a postamble.
Right! I mean also surfacing it in the individual descriptions, just like media types and DB column type:
Description
When updating certain JSON fields in the catalog, we use a "merge" upsert strategy:
https://github.com/WordPress/openverse/blob/ca82f843d5e8097fb2310cdaaa1c8fdf3550fdab/catalog/dags/common/storage/columns.py#L70-L78
This came up in a recent discussion about tags: https://github.com/WordPress/openverse/pull/4475#issuecomment-2177205074
The catalog media properties expansion (#4366) added
DB Column Type
to the description of each field for easier referencing. We also addupsert_strategy
to the table that gets generated at the top of the doc:https://github.com/WordPress/openverse/blob/5041280d482bb73bd15d6641e138ec95b7487fd0/catalog/utilities/media_props_gen/helpers/column_parser.py#L52-L58
We should add an
Upsert Strategy
auto-generated field for each column description (similar toDB Column Type
) which carries this information down to each field. It would also be helpful to have a section in the preamble/postamble which describes each update strategy and links to the code for each strategy in thecolumns.py
file. The auto-generated field documentation can also then link to this description.Alternatives
This was prompted by a recent mention about only new tags being added (and tags never being removed during ingestion). We could add a note explicitly to the
tags
column documentation instead, but it seems more prudent to do this programmatically not only because we can but because it would be more resilient going forward.