ababaian / serratus

Ultra-deep search for novel viruses
http://serratus.io
GNU General Public License v3.0
254 stars 33 forks source link

Come up with source organism description for each genome #191

Open taltman opened 4 years ago

taltman commented 4 years ago

Here's what NCBI has to say about assigning a meaningful entry for the "source" of the sequence material in the face of uncertainty:

https://www.ncbi.nlm.nih.gov/books/NBK53701/#gbankquickstart.can_i_use_the_word__unkn

So we need to come up with an organism descriptor for our submissions that don't need to be placed precisely in the NCBI Taxonomy DB, but the nomenclature should probably not be too strange relative to the existing naming used for CoVs.

rcedgar commented 4 years ago

Serratax provides the identity of the source organism. They allow BLAST top hits as an approximate guide down to genus, which is a bad method for our situation. Serratax is much better because it reliably resolves sub-genus and species, while even genus can be wrong with BLAST (e.g. Bobbie). Perhaps this will need a discussion with GB, and possibly they won't allow it, but Serratax gives much better predictions than blast top hit to genus -- this is exactly why I implemented Serratax!

rcedgar commented 4 years ago

Can we close this? Or unassign me? From my perspective Serratax is the solution.

rcedgar commented 4 years ago

@taltman Can we close this? Or unassign me? From my perspective Serratax is the solution. If there is an open issue for me, please clarify, thanks.

ababaian commented 4 years ago

I think source in this case is host organism not virus