Mishima-syk / psikit

psi4+RDKit
BSD 3-Clause "New" or "Revised" License
93 stars 19 forks source link

Optimize Memory Usage #21

Closed MelvinYYY closed 4 years ago

MelvinYYY commented 5 years ago

Hi,

I'm trying to calculate Dipole Moment and Mulliken Charge for more than 10k of molecules. But it seems the calculation require lots of memory (probably more than 16G. What's the best way to optimize the memory usage?

Thanks,

Melvin

kzfm commented 5 years ago

Hi Melvin,

You can set memory size and num of threads like this,

pk = Psikit(memory=16, threads=8) # 8threads and 16G memory

Best regards,

kzfm

MelvinYYY commented 5 years ago

Cool.

Do I still need to wrap the function with multiprocessing after adding the parameters? But it doesn't seem work so well with multiprocessing.

dipole = []

def get_dipole(mol_name):

try:
    pk = Psikit(memory=16, threads=8)
    pk.mol = mols[mol_name]
    pk.optimize(maxiter=20)
    x, y, z, total = pk.dipolemoment
    # Calculate dipole moment 
    del pk
    gc.collect()
    dm = [mol_name, round(x, 10), round(y, 10), round(z, 10), round(total,10)]

except:
    print(mol_name)

return dm

def append_dipole(mol_name): dm = get_dipole(mol_name) dipole.append(dm)

My Attempt with Multiprocessing.

with Pool(n_cpu) as p: p.map(append_dipole, [mol_name for mol_name in molecule_names[0:10]])

p.close()

What's the best way to improve my code?

Thanks,

Melvin

kzfm commented 5 years ago

On my machine(16G memory, 8threads/1core), it works fine. Here is my code. And I recommend you optimizing compounds after optimizing them with simple basis sets.

We set 6-31G** as a default basis sets of the optimize method now, but we'll change it to "STO-3G",

from psikit import Psikit
from multiprocessing import Pool
import warnings
warnings.simplefilter("ignore")

slist = ["CO", "CN", "OCO", "CCC", "CCl"]

def get_dipole(SMILES):
  pk = Psikit(memory=4, threads=2)
  pk.read_from_smiles(SMILES)
  pk.optimize(basis_sets="scf/sto-3g")
  #pk.optimize(basis_sets="scf/6-31g**")
  x, y, z, total = pk.dipolemoment
  print("{}:{}".format(SMILES, total))
  del(pk)

if __name__ == "__main__":

#### WITHOUT MULTIPROCESSING ################
#   for smiles in slist:
#     get_dipole(smiles)
# 
# $ time python mp.py
# 
#   Memory set to   3.725 GiB by Python driver.
#   Threads set to 2 by Python driver.
# Optimizer: Optimization complete!
# CO:1.5088573841248836
# Optimizer: Optimization complete!
# CN:1.6162232998588775
# Optimizer: Optimization complete!
# OCO:2.2604484185105194
# Optimizer: Optimization complete!
# CCC:0.024261649057818187
# Optimizer: Optimization complete!
# CCl:2.336657881447678
# 
# real    0m13.975s
# user    0m25.444s
# sys     0m0.332s
#
#### WITH MULTIPROCESSING #####################
#  with Pool(4) as p: # 2 threads(psikit) * 4(multiprocessing) = 8 threads 
#    p.map(get_dipole, slist)
#
# $ time python mp.py
# 
# 
#   Memory set to   3.725 GiB by Python driver.
#   Memory set to   3.725 GiB by Python driver.
#   Threads set to 2 by Python driver.
#   Threads set to 2 by Python driver.
#
#
#   Memory set to   3.725 GiB by Python driver.
#   Memory set to   3.725 GiB by Python driver.
#   Threads set to 2 by Python driver.
#   Threads set to 2 by Python driver.
# Optimizer: Optimization complete!
# CO:1.5088573869659105
# Optimizer: Optimization complete!
# CN:1.616223315309578
# Optimizer: Optimization complete!
# CCC:0.024269706709852058
# Optimizer: Optimization complete!
# OCO:1.9997066505417693
# Optimizer: Optimization complete!
# CCl:2.336657917451205
#
# real    0m6.097s
# user    0m32.048s
# sys     0m0.492s