Koeng101 / dnadesign

A Go package for designing DNA.
Other
23 stars 0 forks source link

Megamash file #53

Closed Koeng101 closed 6 months ago

Koeng101 commented 8 months ago

It is useful for me to keep records of megamash matches. I think this should be a file format.

@VN 0.0.1
@KmerSize 16
@MinimialKmerMatches 10
@Threshold 0.2
@Separator |
### START SUBHEADER ###
identifier    sequence
identifier2   sequence
### END SUBHEADER ###
289a197e-4c05-4143-80e6-488e23044378    2    identifier|identifier2    78/150|51/53

The subheader basically has fasta_identifier, sequence as headers, with the actual generated section having query_name(fastq read name), number of matches, matches separated by the separator and then coverage. Cover is actually expressed as int/int, due to the number of kmer matches being relevant information (high number on both means high confidence, while low total kmer means a lower confidence match).

Can also have a complementing JSON implementation, for easily reading after generation. The reason I like having a kind of TSV format is the most common use of the matches will be streaming to other systems.

Koeng101 commented 6 months ago

Closing, because of a megamash rewrite.