RNAcentral / rnacentral-webcode

RNAcentral website source code
https://rnacentral.org
Apache License 2.0
30 stars 9 forks source link

Properly handle inactive genome locations #613

Closed blakesweeney closed 3 months ago

blakesweeney commented 1 year ago

We show genome locations in our genome browser. These can come from two sources. Either the result of the genome mapping pipeline or they were provided to us by an expert database. Those entries can become outdated and we should not show outdated locations. Dealing with this properly is pretty complex and this issue is to track it.

As an example of how this issue could occur, imagine we have a sequence observed in Ensembl and ENA. Ensembl provides a genomic location for it, but ENA will not. At some point Ensembl changes their annotation logic and this transcript is supressed. At this point the entries in xref for Ensembl will be marked as deleted so they will not show up on the webpage. However, the sequence page for that sequence will still have xrefs from ENA. The genome browser will also still show a location and claim it was provided by Ensembl, despite the inactive xref. I don't think this is really correct.

It is easy enough to not show locations from inactive accessions, but what do we do about the genome mapping pipeline? Should we recompute it? Should we use the location, but do not claim it came from anywhere? Mark it as from a deleted xref? This is a bit tricky to figure out.

This is low priority as I think this issue isn't likely to occur.

carlosribas commented 3 months ago

Was the rnc_sequence_regions_active table created to resolve this issue?

blakesweeney commented 3 months ago

Yes and it should be fixed with it.