Expected Runtimes for HAWK

atifrahman / HAWK

Hitting associations with k-mers

GNU General Public License v3.0

46 stars 20 forks source link

Thanks for using it!

The run times will depend on the number of samples and the read coverage in each sample in addition to the size of the genome. Our analysis of ~200 YRI and TSI samples from the 1000 genomes project took around 12 days (mostly to do k-mer counting) using 30 cores. The analysis of E.coli ampicillin resistance data set took about 2 days. It should run in 64GB memory and requirements can be adjusted by decreasing valInc in hawk.cpp.

The case_out_w_bonf.kmerDiff and control_out_w_bonf.kmerDiff files output by hawk.cpp contains k-mers that passed Bonferroni correction (before correcting for co-factors). Unless something is going wrong, they should contain much smaller number of k-mers compared to total number of k-mers. The files may be large because they contain k-mer strings, p-values and counts of the k-mer in each sample.

atifrahman / HAWK

Expected Runtimes for HAWK #8