RNAcentral / rnacentral-import-pipeline

RNAcentral data import pipeline
Apache License 2.0
2 stars 1 forks source link

Cleanup rnc accession columns #179

Closed blakesweeney closed 9 months ago

blakesweeney commented 10 months ago

This is the start of work to cleanup the columns in rnc_accessions. Generally, this pull request stops writing and loading specific columns which do not appear to be useful. This should be safe to just run, but as far as I can tell tests are broken right now. Prior to running this pipeline this we must manually edit rnc_update.update_rnc_accession function in the database to remove references to the columns cleaned up here. Otherwise it will break when loading things into the database. For reference the columns are:

The taxonomic columns are not needed but left in the entry object for now. There needs to be more careful work to remove them as we do use them for some logic within the pipeline.

blakesweeney commented 9 months ago

I've been working through the database tests and have fixed some of them. Fixing those is going to take me a while. I can start tracking it but I suspect this is very slow going.