SciLifeLab / facs

Fast and Accurate Classification of Sequences using Bloom filters
http://facs.scilifelab.se/
Other
16 stars 9 forks source link

new k-mer system #52

Closed tzcoolman closed 11 years ago

tzcoolman commented 11 years ago

replace old copying k-mer approach

arvestad commented 11 years ago

Great! It was good to see the removed strlen and mallocs. Do you see a performance improvement?

fastq_read_check is still malloc:ing and rev_trans:ing for every kmer it is testing. That means that every nucleotide will be copied and translated k times. ;-)

Lasse

On Jul 4, 2013, at 1:20 PM, Enze Liu notifications@github.com wrote:

replace old copying k-mer approach

You can merge this Pull Request by running

git pull https://github.com/tzcoolman/facs master Or view, comment on, or merge it at:

https://github.com/SciLifeLab/facs/pull/52

Commit Summary

-l getopt changed to l: comments for functions memory copy (key) replaced redesign reverse_compliment process changes in bloom.c local changes in lookup8.c small changes in lookup8.c switch back test changes hash test test lower case? test lower case temp remove lower case temp changes test save changes save reverse_compliment changes new k-mer system switched complete finished fix conflicts File Changes

M facs/big_query.c (2) M facs/bloom.c (110) M facs/bloom.h (2) M facs/lookup8.c (16) M facs/simple_check_1_ge.c (83) M facs/simple_remove.c (2) M facs/simple_remove_l.c (5) M facs/tool.c (221) M facs/tool.h (4) Patch Links:

https://github.com/SciLifeLab/facs/pull/52.patch https://github.com/SciLifeLab/facs/pull/52.diff

tzcoolman commented 11 years ago

@arvestad in fastq_read_check, for each read, a malloc and rev_trans may be executed (only once if it is executed), not for every k-mer. The step goes like this: fast filter read in normal order; if nothing found: reverse the whole read and do fast filtering for it again if the filters capture any k-mer: use the reversed read to do a full filtering release the reversed read

And sadly, I didn't find any significant change on speed.

BTW, I didn't apply the new k-mer system on fasta read check since it is experimental

brainstorm commented 11 years ago

Enze;

Good point with the file mapping, we'll have to keep this in mind and compare runtimes when we put the statistics together.

tzcoolman commented 11 years ago

@arvestad @brainstorm 150 seconds for 44,173,641 reads (query Human, filter Ecoli) previous speed: 1049 seconds Much faster now