Kortemme-Lab / flex_ddG_tutorial

MIT License
78 stars 29 forks source link

Running flex ddG using multiple CPU cores-time frame issue #14

Closed bhavranek closed 2 years ago

bhavranek commented 3 years ago

Hello all,

I noticed in the python scripts there is an option to use multiple CPU cores. In order to take advantage of this feature does one have to compile the MPI version of rosetta?

The reason for asking is because I was attempting to perform saturated mutagenesis on a residue. I used the recommended 35 structures for sampling, 35,000 backrub trials etc. and left the program running overnight. I was running the job using 20 CPU cores. 12 hours later it had only gone through 3 of the structures. Therefore, sampling all 35 structures would take days. Having to sample 17 residues, this time frame is highly unfeasible.

Is there something I am doing wrong, such as needing to compile the MPI version of rosetta, or is the normal time frame? Are there any ways to speed the calculation up while also maintaining accuracy?

kylebarlow commented 3 years ago

Hi Brandon - the MPI version of Rosetta is not necessary for this protocol, as each structure can be produced independently by regular, single-threaded Rosetta.

The protocol is relatively computational intensive, so running on a HPC cluster is the usual route to get through the 1000's of hours that might be required more quickly. It does seem like the protocol is running particularly slowly for you - if your input protein complex is very large, that could be one reason. One possibility would be to trim residues out of the input PDB that are far away from the interface of interest. These extra, far away, residues would slow down the computation but be very unlikely to affect the ddG calculation.

bhavranek commented 3 years ago

thank you @kylebarlow for the suggestion. It is a large protein, so I will try as you suggested and see if it speeds up the computation time.