Closed jamesPet closed 2 years ago
Hello, The output format is in a cd-hit output format. The lines beginning with tab delimiters are the genome information in each cluster. There is a little difference when running clust-mst and clust-greedy with different input options (-l and -i). Option -l means input as a FASTA file list, one file per genome, and option -i means input as a single FASTA file, one sequence per genome. From left to right, for both -l and -i options, the tab-delimited values are the local index in a cluster, global index, and genome length. For the -l option, the remaining values are the genome file name(including genome assembly accession number), the first sequence name in a genome file, and the rest are this sequence's comments. For the -i option, the remaining values are the sequence name and the comment of this sequence.
Best, Xiaoming Xu
Hello,
What's the format of the output file? In particular, the tab delimited values in lines not beginning with "the cluster"?
Thank you! jamie