RNAcentral / rnacentral-webcode

RNAcentral website source code
https://rnacentral.org
Apache License 2.0
31 stars 8 forks source link

Display phylogenies using rnc_taxonomy table #437

Closed blakesweeney closed 1 year ago

blakesweeney commented 5 years ago

I've created a table called rnc_taxonomy that stores the taxonomic information for an NCBI taxid. It is very simple and I'm still fixing an issue with importing data (it has issues importing all some organisms because of quotes in their aliases). But that only effects a few taxids we use (~140). The table has 4 columns:

The table will contain all taxids, not just species or any other level. In addition, it will track if a taxid is merged into another one with the replaced_by column. When a taxid has been merged the name and lineage information will reflect the taxid it has been merged into. We can use this table instead of the classification and species columns in rnc_accessions.

For now I will fix the issues with importing and continue to populate each time we run the pipeline. I will also use it for search export so we can hopefully allow searches of E. coli to actually find the expected species. Once @BurkovBA says we have switched over to using it in the webcode I will delete the relevant columns from rnc_accessions.

blakesweeney commented 5 years ago

Note that there are some sequences (~400 sequences) which don't yet work because the taxid was deleted after we imported the sequence but before we imported taxonomic information. I can fix this however.

blakesweeney commented 1 year ago

This isssue is stale, and partially completed. I will close in favor of newer smaller ones.