Display phylogenies using rnc_taxonomy table

blakesweeney commented 5 years ago

I've created a table called rnc_taxonomy that stores the taxonomic information for an NCBI taxid. It is very simple and I'm still fixing an issue with importing data (it has issues importing all some organisms because of quotes in their aliases). But that only effects a few taxids we use (~140). The table has 4 columns:

id: The primary key which is an NCBI taxid as an integer.
name: The scientific name for that taxid.
lineage: The lineage (cellular organisms; Archaea; Euryarchaeota...) for the taxid.
aliases: Array of other recognised names for that taxid.
replaced_by: If this taxid has been merged into another one then this will show the new taxid, otherwise null.

The table will contain all taxids, not just species or any other level. In addition, it will track if a taxid is merged into another one with the replaced_by column. When a taxid has been merged the name and lineage information will reflect the taxid it has been merged into. We can use this table instead of the classification and species columns in rnc_accessions.

For now I will fix the issues with importing and continue to populate each time we run the pipeline. I will also use it for search export so we can hopefully allow searches of E. coli to actually find the expected species. Once @BurkovBA says we have switched over to using it in the webcode I will delete the relevant columns from rnc_accessions.

blakesweeney commented 5 years ago

Note that there are some sequences (~400 sequences) which don't yet work because the taxid was deleted after we imported the sequence but before we imported taxonomic information. I can fix this however.

blakesweeney commented 1 year ago

This isssue is stale, and partially completed. I will close in favor of newer smaller ones.

RNAcentral / rnacentral-webcode

Display phylogenies using rnc_taxonomy table #437