RNAcentral / rnacentral-webcode

RNAcentral website source code
https://rnacentral.org
Apache License 2.0
30 stars 9 forks source link

Ensure all RNA sequences are inactive #611

Closed blakesweeney closed 1 year ago

blakesweeney commented 1 year ago

At one point, I created a bug in import where I was importing RNA sequences (contains U's not T's) instead of DNA (T's not U's) ones. This put invalid data into the rna table. That bug should be fixed, for Rfam at least, but we should verify that this is true. To do this we need to check that all sequences in the rna table which contain a U have no active xrefs. If they do have some we need to look into which databases are causing this. I think there may be some cases where the U is present but the sequence is DNA, but those should be small and we will have to manually verify each one.

carlosribas commented 1 year ago

It looks like the bug you mentioned has indeed been fixed, as I haven't found sequences that contain U and have xref active. In order to do that, I ran:

SELECT r.upi FROM rna r JOIN xref x ON r.upi=x.upi WHERE r.seq_short like '%U%' and x.deleted='N' LIMIT 10;

I also performed this query using the seq_long field

blakesweeney commented 1 year ago

Looks like it is fixed to me then. Thanks!