RNAcentral / rnacentral-webcode

RNAcentral website source code
https://rnacentral.org
Apache License 2.0
31 stars 8 forks source link

Update qa information #551

Closed blakesweeney closed 2 years ago

blakesweeney commented 3 years ago

This adds a new section to the sequence pages to show the QA information. This is done by adding a new API endpoint, qa-status and then a new angular component to display the information.

I choose to add a new endpoint because the previous source of this data was from the rfam problems field and this field is flawed. This field doesn't get fetch the qa information or the messages from the database but instead there is an rfam problems module that computes it. I'm unsure if it does the same thing as the pipeline. The new version adds a model for the qa_status table and that is where we now get the messages. A part of the QA pipeline is to compute these messages and add them to the database so I thought we were always using them but weren't. I could also push these messages out to the search index and get them from there but it seemed to be more work.

This now allows the display of the possible orf messages the pipeline is computing. Also as I add more warnings, say one for pseudogenes or too many mapped locations, the message will show up automatically. That is probably the most useful improvement in this change.

Examples:

You need to run this with prod database and the dev search index. One thing to note is I think the search index is incorrect about which sequences have the orf warning and I'm going to be recompute that shortly.

This could still use some polishing, but feedback now would be great. The things I know I need to improve are:

One question I'm not sure of yet is if we should delete the old rfam problems field. It may or may not be correct but it is different than the pipeline while removing it would change our API.