Closed tzcoolman closed 11 years ago
Great! It was good to see the removed strlen and mallocs. Do you see a performance improvement?
fastq_read_check is still malloc:ing and rev_trans:ing for every kmer it is testing. That means that every nucleotide will be copied and translated k times. ;-)
Lasse
On Jul 4, 2013, at 1:20 PM, Enze Liu notifications@github.com wrote:
replace old copying k-mer approach
You can merge this Pull Request by running
git pull https://github.com/tzcoolman/facs master Or view, comment on, or merge it at:
https://github.com/SciLifeLab/facs/pull/52
Commit Summary
-l getopt changed to l: comments for functions memory copy (key) replaced redesign reverse_compliment process changes in bloom.c local changes in lookup8.c small changes in lookup8.c switch back test changes hash test test lower case? test lower case temp remove lower case temp changes test save changes save reverse_compliment changes new k-mer system switched complete finished fix conflicts File Changes
M facs/big_query.c (2) M facs/bloom.c (110) M facs/bloom.h (2) M facs/lookup8.c (16) M facs/simple_check_1_ge.c (83) M facs/simple_remove.c (2) M facs/simple_remove_l.c (5) M facs/tool.c (221) M facs/tool.h (4) Patch Links:
https://github.com/SciLifeLab/facs/pull/52.patch https://github.com/SciLifeLab/facs/pull/52.diff
@arvestad in fastq_read_check, for each read, a malloc and rev_trans may be executed (only once if it is executed), not for every k-mer. The step goes like this: fast filter read in normal order; if nothing found: reverse the whole read and do fast filtering for it again if the filters capture any k-mer: use the reversed read to do a full filtering release the reversed read
And sadly, I didn't find any significant change on speed.
BTW, I didn't apply the new k-mer system on fasta read check since it is experimental
Enze;
Good point with the file mapping, we'll have to keep this in mind and compare runtimes when we put the statistics together.
@arvestad @brainstorm 150 seconds for 44,173,641 reads (query Human, filter Ecoli) previous speed: 1049 seconds Much faster now
replace old copying k-mer approach