The subheader basically has fasta_identifier, sequence as headers, with the actual generated section having query_name(fastq read name), number of matches, matches separated by the separator and then coverage. Cover is actually expressed as int/int, due to the number of kmer matches being relevant information (high number on both means high confidence, while low total kmer means a lower confidence match).
Can also have a complementing JSON implementation, for easily reading after generation. The reason I like having a kind of TSV format is the most common use of the matches will be streaming to other systems.
It is useful for me to keep records of megamash matches. I think this should be a file format.
The subheader basically has
fasta_identifier, sequence
as headers, with the actual generated section having query_name(fastq read name), number of matches, matches separated by the separator and then coverage. Cover is actually expressed asint/int
, due to the number of kmer matches being relevant information (high number on both means high confidence, while low total kmer means a lower confidence match).Can also have a complementing JSON implementation, for easily reading after generation. The reason I like having a kind of TSV format is the most common use of the matches will be streaming to other systems.