test_pdb.py currently uses only 1 thread to do its job. I am looking to parallelize it so that I can speed things up on a mulit-core machine. This routine will be generally useful for other intensive purposes in the future.
For example, it took 31m38.676s (1898s) to calculate nDOPE for 3393 pdb structures. The S35 set has 21155 structures, meaning running over the set would take 11833s (3.29 hrs). If we parallelise it with 6 cores, this could reduce to 0.54 hrs.
As suggest by Tony, I profiled the code to check Modeller is indeed taking up the most time.
Timer unit: 1e-06 s
Total time: 12.4703 s
File: <ipython-input-27-19c264d8bf47>
Function: main at line 2
Line # Hits Time Per Hit % Time Line Contents
==============================================================
2 def main():
3 1 2 2.0 0.0 wait = 0;
4 1 1 1.0 0.0 waitname = "3p9dG03";
5 1 1 1.0 0.0 reset = 0;
6
7 1 1 1.0 0.0 if reset:
8 open("ref_DOPEs.csv","w").close()
9
10 1 4 4.0 0.0 import csv
11
12 1 46 46.0 0.0 with open("ref_DOPEs.csv", "a") as f:
13 1 12 12.0 0.0 c = csv.writer(f)
14
15 14 53 3.8 0.0 for pdbfile in onlyfiles:
16 13 116 8.9 0.0 pdbname = os.path.basename(pdbfile);
17 13 101 7.8 0.0 if pdbfile.split(".")[-1] in ["bak"] or wait:
18 # onlyfiles.pop(pdbfile);
19 1 1 1.0 0.0 if pdbname == waitname:
20 wait = 0;
21 continue
22 12 8685 723.8 0.1 print("\n\n//Testing structure from %s" % pdbfile)
23 12 21 1.8 0.0 try:
24 12 12459657 1038304.8 99.9 nDOPE = get_nDOPE( join(pdbfile), env = env)
25 1 3 3.0 0.0 except:
26 1 146 146.0 0.0 print("can't process", pdbfile)
27 # break
28
29 12 46 3.8 0.0 nDOPEs.append( nDOPE );
30 12 16 1.3 0.0 tested_files.append( pdbname );
31 12 328 27.3 0.0 c.writerow( [pdbname, nDOPE] )
32 12 1046 87.2 0.0 f.flush()
test_pdb.py currently uses only 1 thread to do its job. I am looking to parallelize it so that I can speed things up on a mulit-core machine. This routine will be generally useful for other intensive purposes in the future.
For example, it took 31m38.676s (1898s) to calculate nDOPE for 3393 pdb structures. The S35 set has 21155 structures, meaning running over the set would take 11833s (3.29 hrs). If we parallelise it with 6 cores, this could reduce to 0.54 hrs.
As suggest by Tony, I profiled the code to check Modeller is indeed taking up the most time.