Open mr-c opened 9 years ago
I'll +1 the idea of being able to spit out a fasta file of kmers like jellyfish dump
does. I assume this functionality does not currently exist. (?)
@macmanes - see sandbox/count-kmers.py and sandbox/count-kmers-single.py.
User story: PBcR-MHAP uses Jellyfish configured to request 1 TB of memory. Luiz wants to use less memory.
http://sourceforge.net/p/wgs-assembler/svn/HEAD/tree/trunk/src/AS_PBR/PBcR.pl#l1621
A single script could build a counting hash from the input sequences, calculate the cutoff value (see http://sourceforge.net/p/wgs-assembler/svn/HEAD/tree/trunk/src/AS_PBR/PBcR.pl#l1634 ), re-read the sequences to output the counts for each k-mer (possibly using a presence table to avoid over reporting).
Bonus: format the output to match http://sourceforge.net/p/wgs-assembler/svn/HEAD/tree/trunk/src/AS_PBR/PBcR.pl#l1638
Since we have a different workflow this doesn't have to recreate the same command line options but should produce compatible output
jellyfish count [-> jellyfish merge] -> jellyfish histo -> jellyfish dump
Jellyfish histogram files are space delimited.