ababaian / serratus

Ultra-deep search for novel viruses
http://serratus.io
GNU General Public License v3.0
254 stars 33 forks source link

Top hit and identity analysis for assemblies #167

Closed rcedgar closed 3 years ago

rcedgar commented 4 years ago

For each assembly, we should determine the most similar Genbank sequences, both complete genomes and "fragments", i.e. complete CDSs etc. We should include this information in a master summary file for assemblies, or in a short text file attached to each assembly:

  1. Top hit to complete genome (T).
  2. %id of assembly with T.
  3. Top hit to fragment (F).
  4. %id of assembly to F.

Can the annotation pipeline do this, or should it be a separate step? Maybe I could add it to Serratax.