OndrejSladky / kmercamel

KmerCamel🐫 provides implementations of several algorithms for efficiently representing a set of k-mers as a masked superstring.
MIT License
12 stars 2 forks source link

Output FASTA headers are currently confusing #80

Closed karel-brinda closed 1 week ago

karel-brinda commented 1 week ago

If I run kmercamel on human genome, the output FASTA file starts with >chr1 CP068277.2 Homo sapiens isolate CHM13 chromosome 1.

This is misleading. It's not genomic sequence, it's chromosome 1, etc.....

Instead, it should use some reasonable name such as

>{original_filename} k=31 properties of the superstring

The properties should ideally include statistics (len, #1, #kmers etc. ) and the mask optimization strategy.

karel-brinda commented 1 week ago

I messed up files. Sorry. This is invalid; I was looking at the wrong output :).