OndrejSladky / kmercamel

KmerCamel🐫 provides implementations of several algorithms for efficiently representing a set of k-mers as a masked superstring.
MIT License
12 stars 2 forks source link

output of mask optimization is not FASTA #65

Closed PavelVesely closed 8 months ago

PavelVesely commented 8 months ago

Unlike running the masked superstring computation (global or local greedy), the output of kmercamel optimize is not a FASTA file, i.e., missing a header line. I think that for consistency, it should be the same.

karel-brinda commented 8 months ago

Agree, the first line can always be discarded.

OndrejSladky commented 8 months ago

Unlike running the masked superstring computation (global or local greedy), the output of kmercamel optimize is not a FASTA file, i.e., missing a header line. I think that for consistency, it should be the same.

How do you run the program? If the input file is a fasta (i.e. it contains a header) the result is also a fasta. If the input is not a fasta (which I assume is your use-case) then it is not a fasta. Which seems justified to me. I could change it, but then it'll be difficult to maintain the same fasta header on input.

PavelVesely commented 8 months ago

How do you run the program? If the input file is a fasta (i.e. it contains a header) the result is also a fasta. If the input is not a fasta (which I assume is your use-case) then it is not a fasta. Which seems justified to me. I could change it, but then it'll be difficult to maintain the same fasta header on input.

I see, then it makes sense, and it won't occur in practice. I run kmercamel optimize on a text file, which is already without the header. I'm closing this issue.

PavelVesely commented 8 months ago

Reopening this issue, as optimizing runs (runs or runsapprox) behaves inconsistently: Even though the input is a file with masked superstring but no header, the output actually does have a header. Optimizing ones or zeros doesn't add the header.

Here's a little experiment to verify:

$ head -c 50 <spneumoniae.S_global.k_9.d_na.M_default.maskedSuperstring.txt
GGCTCGACAAATTGATTAAGTACTCGTTGGTTACGTCGCTGTttatccCG

$ kmercamel/kmercamel optimize -k 9 -c -a ones -p spneumoniae.S_global.k_9.d_na.M_default.maskedSuperstring.txt -o spneu.k_9.ones.txt
$ head -c 50 <spneu.k_9.ones.txt
GGCTCGACAAATTGATTAAGTACTCGTTGGTTACGTCGCTGTTTaTccCG

$ kmercamel/kmercamel optimize -k 9 -c -a runs -p spneumoniae.S_global.k_9.d_na.M_default.maskedSuperstring.txt -o spneu.k_9.runs.txt
$ head -c 50 <spneu.k_9.runs.txt
> superstring
GGCTCGACAAATTGATTAAGTACTCGTTGGTTACGT
PavelVesely commented 8 months ago

Have fixed this in PR #70