Closed danpolanco closed 9 years ago
Hi Dan,
Jellyfish does not count k-mers in a standard map. So using Jellyfish to parse a sequence is not going to have that function signature. Second, if your sequence is not very big (likely if coming from a std::string), then using Jellyfish is probably not going to get you much.
Now, if you are parsing sequence from reasonably large file, I can provide an example on how to parse it directly with Jellyfish in C++. The result will be in a Jellyfish hash array which is not a drop in replacement for a std::map.
If this is what you want, I'll add such an example in the examples directory.
As it stands, I import the genome in a std::string and then split each contig into its own string. Then, I process each contig by itself including using Jellyfish to count k-mers. I'm working with the human genome by the way.
In the future I'll also process the whole genome and the contigs sequentially. I'm new to C++ however and I don't know what problems I'll run into once I process the whole genome along side each contig. For now though each contig in a string works well.
In any case, I should be able to read out all of the k-mers in the JF hash array and their counts, correct? If so, I think an example would help greatly if you can spare the time. If I can get those two things (k-mer and associated count) then I don't need a map.
FYI: I tried creating my own counter and it was horrifically slow even with a single contig haha. So Jellyfish is definitely a life line for my work.
I was able to figure it out using one of your examples. Thanks!
Before I dive into the source, I wanted to see how hard it would be to use Jellyfish as a library to go from sequence to kmer and count.
My function interface would look something like:
And then in the definition, perhaps something like:
Right now I use
system()
but I would love to include it as a native library instead.Btw, great work! I looked at a few other k-mer counters and came back to Jellyfish shortly after :+1:
Edit: removed some unnecessary code.