memory usage - Githubissues

Dear Luohao,

Thanks so much for using SNPGenie! Unfortunately, I do not have plans to speed up the actual algorithm at this time. Assuming that your input data is in the form described—a VCF file with SNP data for single reference FASTA—I think one good approach is to split up the genome, as you have done, probably by chromosome. If this does not work, then you could try smaller subsections; this is actually quite easy to do and to automate, since you can extract a range (a-b) of sites from the FASTA, and then pull out variants from the VCF for only those sites. Another approach would be to target specific genomic regions of interest.

Another program to try is PoPoolation, which might be faster if you turn OFF corrections, but it makes various approximations and essentially assumes all variant positions in the raw reads are bona fide SNPs, i.e., it does not take advantage of SNP calling. And, if your input data is not for deep (pooled) sequencing of a single sample but rather a summary of many genomes, PoPoolation is not applicable.

Please let me know if any of this is helpful! Apologies for the memory difficulties. Chase

chasewnelson / SNPGenie

memory usage #7