For each assembly, we should determine the most similar Genbank sequences, both complete genomes and "fragments", i.e. complete CDSs etc. We should include this information in a master summary file for assemblies, or in a short text file attached to each assembly:
Top hit to complete genome (T).
%id of assembly with T.
Top hit to fragment (F).
%id of assembly to F.
Can the annotation pipeline do this, or should it be a separate step? Maybe I could add it to Serratax.
For each assembly, we should determine the most similar Genbank sequences, both complete genomes and "fragments", i.e. complete CDSs etc. We should include this information in a master summary file for assemblies, or in a short text file attached to each assembly:
Can the annotation pipeline do this, or should it be a separate step? Maybe I could add it to Serratax.