Open krysal opened 4 weeks ago
I tried to run the DAG again today, and it turns out RDS does not support the PL/Pythonu extension :disappointed:
RDS does support plperl, plrust, plpgsql, and plv8. Of those, plv8 (v8 being the same JS engine as Chrome), might be the most proximate for this use case, but I'll see. It might be that the reingestion of these records is the easiest, most reliable way to do it.
Problem
In #4143, @obulat proposed to add new cleaning steps to the fix tags in the Catalog, but the option of including them in the Ingestion Server was declined in favor of using the
batched_update
DAG.Description
We want to take the functions that were planned to include in said PR and translate them into parameters for this DAG. Given the complexity of the decoding transformation it might require some advanced functions of PostgreSQL, like a combination of pattern matching and PL/Python Functions.
In #1566, duplicated tags were previously removed, so we will apply the same solution given the decoding may cause new duplicates.
Additional context
Related to #4125 and #4199 (similar issue).