Closed PavelVesely closed 8 months ago
Agree, the first line can always be discarded.
Unlike running the masked superstring computation (global or local greedy), the output of
kmercamel optimize
is not a FASTA file, i.e., missing a header line. I think that for consistency, it should be the same.
How do you run the program? If the input file is a fasta (i.e. it contains a header) the result is also a fasta. If the input is not a fasta (which I assume is your use-case) then it is not a fasta. Which seems justified to me. I could change it, but then it'll be difficult to maintain the same fasta header on input.
How do you run the program? If the input file is a fasta (i.e. it contains a header) the result is also a fasta. If the input is not a fasta (which I assume is your use-case) then it is not a fasta. Which seems justified to me. I could change it, but then it'll be difficult to maintain the same fasta header on input.
I see, then it makes sense, and it won't occur in practice. I run kmercamel optimize
on a text file, which is already without the header.
I'm closing this issue.
Reopening this issue, as optimizing runs (runs
or runsapprox
) behaves inconsistently: Even though the input is a file with masked superstring but no header, the output actually does have a header. Optimizing ones or zeros doesn't add the header.
Here's a little experiment to verify:
$ head -c 50 <spneumoniae.S_global.k_9.d_na.M_default.maskedSuperstring.txt
GGCTCGACAAATTGATTAAGTACTCGTTGGTTACGTCGCTGTttatccCG
$ kmercamel/kmercamel optimize -k 9 -c -a ones -p spneumoniae.S_global.k_9.d_na.M_default.maskedSuperstring.txt -o spneu.k_9.ones.txt
$ head -c 50 <spneu.k_9.ones.txt
GGCTCGACAAATTGATTAAGTACTCGTTGGTTACGTCGCTGTTTaTccCG
$ kmercamel/kmercamel optimize -k 9 -c -a runs -p spneumoniae.S_global.k_9.d_na.M_default.maskedSuperstring.txt -o spneu.k_9.runs.txt
$ head -c 50 <spneu.k_9.runs.txt
> superstring
GGCTCGACAAATTGATTAAGTACTCGTTGGTTACGT
Have fixed this in PR #70
Unlike running the masked superstring computation (global or local greedy), the output of
kmercamel optimize
is not a FASTA file, i.e., missing a header line. I think that for consistency, it should be the same.