Closed mbhall88 closed 8 hours ago
Can you share the input files you are using? That is unexpected behavior.
This is the MTB genome https://www.ncbi.nlm.nih.gov/nuccore/NC_000962.3 and this is the M. smegmatis genome https://www.ncbi.nlm.nih.gov/nuccore/NC_008596.1
I am using the bioconda installation of primerForge
Can you please rerun using the --debug
flag and then share the .log
file with me? Also, how do you determine how much memory was being allocated?
Okay, running now. Will share the log when it's done.
Well I requested 160GB of memory from slurm and then the job failed with 'out of memory'
Just failed with 200GB (max RSS 208658340KB)
INFO:bin.main:version: 1.1.1
INFO:bin.main:ingroup: /data/scratch/projects/punim1703/WGA/data/references/h37rv.fa
INFO:bin.main:outgroup: /data/scratch/projects/punim1703/WGA/data/references/MSmeg.fa
INFO:bin.main:results filename: /data/scratch/projects/punim1703/tbvcf/tmp/primerForge/results.tsv
INFO:bin.main:file format: fasta
INFO:bin.main:min kmer len: 16
INFO:bin.main:max kmer len: 20
INFO:bin.main:min % G+C 40.0
INFO:bin.main:max % G+C 60.0
INFO:bin.main:min Tm: 55.0
INFO:bin.main:max Tm: 68.0
INFO:bin.main:max Tm difference: 5.0
INFO:bin.main:min PCR size: 120
INFO:bin.main:max PCR size: 2400
INFO:bin.main:disallowed outgroup PCR sizes: 120-2400
INFO:bin.main:num threads: 8
INFO:__getCandidates:identifying kmers suitable for use as primers in all 1 ingroup genome sequences
INFO:_getAllCandidateKmers: getting shared ingroup kmers that appear once in each genome
DEBUG:__getSharedKmers: 18539592 shared kmers after processing h37rv.fa
INFO:_getAllCandidateKmers: done 00:06:54.69
INFO:_getAllCandidateKmers:dumping shared kmers to '_pickles/sharedKmers.p'
INFO:_getAllCandidateKmers:done 00:01:09.11
INFO:_getAllCandidateKmers: evaluating kmers
INFO:_getAllCandidateKmers: done 00:02:14.37
INFO:_getAllCandidateKmers: identified 843387 candidate kmers
INFO:__getCandidates:done 00:10:20.55
INFO:__getCandidates:dumping candidate kmers to '_pickles/candidates.p'
INFO:__getCandidates:done 00:00:04.04
INFO:__getUnfilteredPairs:identifying pairs of primers found in all ingroup sequences
I believe I have resolved this issue. I have improved performance in many places. When I ran with your inputs, I recorded it using <15gb ram. I am going to close this issue; please feel free to report back if I need to reopen it.
note: the conda installation and docker image are not currently up and running for the updated version. please use only the manual installation or the pip installation for the time being.
great. can confirm this completed with 8 threads in 13 mins and ~23GB memory.
As part of https://github.com/openjournals/joss-reviews/issues/6850 I have been trying to run primerForge using MTB as the ingroup and M. smegmatis as the outgroup
so far I have been unable to get this to complete as it keeps hitting my job memory limits. The last run I tried allocated 160GB and the job failed due to out of memory problems... This seems very high. Is this expected? If so, this should probably be documented somewhere. Or am I doing something wrong here? There isn't an example usage (see #2) so I am just basing my executation on the CLI help menu