Adamtaranto / frisk

Screen genomic scaffolds for regions of unusual k-mer composition.
http://adamtaranto.github.io/frisk/
GNU General Public License v3.0
2 stars 3 forks source link

Cythonise IVOM & K-mer hashing #17

Closed kdm9 closed 7 years ago

kdm9 commented 8 years ago

This is work in progress, don't merge yet.

I've rewritten the hashing to use kmer count vectors, and re-done the IVOM function all in cython. Does about 2-5 3kb windows every second, including hashing. How does that compare time-wise?

Adamtaranto commented 8 years ago

I'll add a timer for window IVOM & KLI processing. I'd guess about 1-2 5000bp windows per second.

computeKmers() when run in 'genomeMode' is also a time issue. Not sure if there is a way around slicing up the genome one kmer at a time to do initial kmer census.

kdm9 commented 8 years ago

If you want to test it out, I'm using https://gist.github.com/19e3df51c6e4f3af642a

Also, you need to do python setup.py build_ext --inplace before running the above.

kdm9 commented 8 years ago

Also this hasn't been optimised at all, just compiled to C w/ cython. Which isn't magic, I was just trying to re-implement IVOM such that I understood it. We first check that the window IVOM you get for two sequences is the same (which it won't be yet as there are a couple of bugs).