ReactionMechanismGenerator / RMG-Py

Python version of the amazing Reaction Mechanism Generator (RMG).
http://reactionmechanismgenerator.github.io/RMG-Py/
Other
387 stars 227 forks source link

RMG fails to generate reactions for fragments when > 1 processes are used #2685

Open donerancl opened 3 months ago

donerancl commented 3 months ago

Bug Description

update_charge from rmgpy/molecule/molecule.py does not recognize cutting labels when more than 1 processes are used for reaction generation with react from rmgpy/rmg/react.py. When procnum = 1 , there is no error.

For cases where procnum = 1, _react_species_star is mapped to spc_fam_tuples with python. For cases where procnum > 1, _react_species_star is mapped to spc_fam_tuples using Pool

def react(spc_fam_tuples, procnum=1):
    """
    Generate reactions between the species in the list of species-family tuples
    for the optionally specified reaction families.

    Each item should be a tuple of a species tuple and an optional family list:
        [((spc1,), [family1, ...]), ((spc2, spc3), [family2, ...]), ...]
            OR
        [((spc1,),), ((spc2, spc3),), ...]

    If no family list is provided, all of the loaded families are considered.

    Args:
        spc_fam_tuples (list): list of tuples for reaction generation
        procnum (int, optional): number of processors used for reaction generation

    Returns:
        list of lists of reactions generated from each species tuple (note: empty lists are possible)
    """
    # Execute multiprocessing map. It blocks until the result is ready.
    # This method chops the iterable into a number of chunks which it
    # submits to the process pool as separate tasks.
    if procnum == 1:
        logging.info('For reaction generation {0} process is used.'.format(procnum))
        reactions = list(map(_react_species_star, spc_fam_tuples))                         # <========= map with python
    else:
        logging.info('For reaction generation {0} processes are used.'.format(procnum))
        p = Pool(processes=procnum)
        reactions = p.map(_react_species_star, spc_fam_tuples)                            # <========= map with Pool
        p.close()
        p.join()

    return reactions

How To Reproduce

import sys
sys.path.append("/home/gridsan/adoner/RMG-database")
from rmgpy.molecule.fragment import Fragment
from rmgpy.species import Species
from rmgpy.data.rmg import RMGDatabase
from input.kinetics.families.recommended import default
from rmgpy.rmg.react import react

global database
database = RMGDatabase().load_kinetics("/home/gridsan/adoner/RMG-database/input/kinetics",kinetics_families=list(default),reaction_libraries = [])

def print_reactions(reactions):
    for reaction in reactions:
        reactants = [R.molecule[0].smiles for R in reaction.reactants]
        products = [P.molecule[0].smiles for P in reaction.products]
        print(f"{reactants} <=> {products}")

smiles_a = "LCCR"
frag_a = Fragment().from_smiles_like_string(smiles_a)
adjlist_a = frag_a.to_adjacency_list()
spec_a = Species().from_adjacency_list(adjlist_a)
species_tuples = [((spec_a,),)]

reactions_1, = react(species_tuples,procnum=1)
print_reactions(reactions_1)

reactions_2, =react(species_tuples,procnum=2)
print_reactions(reactions_2)

Expected Behavior

We should get the same result for each number of processors

['[CH2]R', '[CH2]L'] <=> ['LCCR']  # reactions generated with procnum = 1
['L[CH]CR', '[H]'] <=> ['LCCR']
['R[CH]CL', '[H]'] <=> ['LCCR']
['[CH2]R', '[CH2]L'] <=> ['LCCR'] # reactions generated with procnum = 2
['L[CH]CR', '[H]'] <=> ['LCCR']
['R[CH]CL', '[H]'] <=> ['LCCR']

Instead we get

['[CH2]R', '[CH2]L'] <=> ['LCCR']  # reactions generated with procnum = 1
['L[CH]CR', '[H]'] <=> ['LCCR']
['R[CH]CL', '[H]'] <=> ['LCCR']
---------------------------------------------------------------------------
RemoteTraceback                           Traceback (most recent call last)
RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/home/gridsan/adoner/mambaforge/envs/rmg_noj/lib/python3.7/multiprocessing/pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "/home/gridsan/adoner/mambaforge/envs/rmg_noj/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar
    return list(map(*args))
  File "/home/gridsan/adoner/RMG-Py/rmgpy/rmg/react.py", line 81, in _react_species_star
    return react_species(*args)
  File "/home/gridsan/adoner/RMG-Py/rmgpy/rmg/react.py", line 96, in react_species
    reactions = get_db('kinetics').generate_reactions_from_families(species_tuple, only_families=only_families)
  File "/home/gridsan/adoner/RMG-Py/rmgpy/data/kinetics/database.py", line 497, in generate_reactions_from_families
    ensure_independent_atom_ids(reactants, resonance=resonance)
  File "/home/gridsan/adoner/RMG-Py/rmgpy/data/kinetics/common.py", line 217, in ensure_independent_atom_ids
    species.generate_resonance_structures(keep_isomorphic=True)
  File "rmgpy/species.py", line 281, in rmgpy.species.Species.generate_resonance_structures
  File "rmgpy/species.py", line 292, in rmgpy.species.Species.generate_resonance_structures
  File "rmgpy/molecule/molecule.py", line 2280, in rmgpy.molecule.molecule.Molecule.generate_resonance_structures
  File "rmgpy/molecule/molecule.py", line 2282, in rmgpy.molecule.molecule.Molecule.generate_resonance_structures
  File "rmgpy/molecule/resonance.py", line 152, in rmgpy.molecule.resonance.generate_resonance_structures
  File "rmgpy/molecule/resonance.py", line 181, in rmgpy.molecule.resonance.generate_resonance_structures
  File "rmgpy/molecule/molecule.py", line 1241, in rmgpy.molecule.molecule.Molecule.update
  File "rmgpy/molecule/molecule.py", line 545, in rmgpy.molecule.molecule.Atom.update_charge
  File "rmgpy/molecule/molecule.py", line 556, in rmgpy.molecule.molecule.Atom.update_charge
KeyError: 'L'
"""
The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
/tmp/ipykernel_788234/791460228.py in <module>
     10     print(f"{reactants} <=> {products}")
     11 
---> 12 react(species_tuples,procnum=2)

~/RMG-Py/rmgpy/rmg/react.py in react(spc_fam_tuples, procnum)
     68         logging.info('For reaction generation {0} processes are used.'.format(procnum))
     69         p = Pool(processes=procnum)
---> 70         reactions = p.map(_react_species_star, spc_fam_tuples)
     71         p.close()
     72         p.join()

~/mambaforge/envs/rmg_noj/lib/python3.7/multiprocessing/pool.py in map(self, func, iterable, chunksize)
    266         in a list that is returned.
    267         '''
--> 268         return self._map_async(func, iterable, mapstar, chunksize).get()
    269 
    270     def starmap(self, func, iterable, chunksize=None):

~/mambaforge/envs/rmg_noj/lib/python3.7/multiprocessing/pool.py in get(self, timeout)
    655             return self._value
    656         else:
--> 657             raise self._value
    658 
    659     def _set(self, i, obj):

KeyError: 'L'

Installation Information

Describe your installation method and system information.

Additional Context

Add any other context about the problem here.

JacksonBurns commented 3 months ago

Thanks for the tag following our offline discussion. The one mercy in this is that at long RMG simulation times the expense of generating reactions loses out to the expense of running reactor simulations.

We will have to debug this further, though!