RNAcentral / rnacentral-import-pipeline

RNAcentral data import pipeline
Apache License 2.0
2 stars 1 forks source link

Changes in pipeline to account for multiple assemblies #132

Open afg1 opened 2 years ago

afg1 commented 2 years ago

Looked at these files (output of rg -l ensembl_assembly)

files/repeats/find-assemblies.sql files/genome-mapping/post.sql files/genome-mapping/find_species.sql files/ftp-export/genome_coordinates/known-coordinates.sql files/genome-mapping/load.ctl files/genes/species.sql files/genes/schema.sql files/import-data/post-release/001regions.sql files/import-data/post-release/001coordinate-systems.sql files/import-data/post-release/001ensembl-pseudogenes.sql files/import-data/post-release/001locations.sql files/import-data/post-release/002Cleanup_assembly_table.sql files/import-data/ensembl/known-assemblies.sql files/import-data/pre-release/000assemblies.sql workflows/databases/mirgenedb.nf rnacentral_pipeline/databases/ensembl/metadata/assemblies.py

And chased down uses of those files. I think I got everything and it will do what we want

blakesweeney commented 2 years ago

This looks reasonable to me, but what are we doing with the Cleanup_assembly_table script? I think we can just leave the assemblies alone since we will only use the selected_genomes in the webcode.