Output labeled adjacency lists for each reaction

rwest commented 5 years ago

Eric Hermes @ehermes would like, in order to test Sella, would like labeled adjacency lists for each (important) reaction.

rwest commented 5 years ago

This was mentioned again on the April 8 call. Eric needs to know how each reaction will be described.

rwest commented 5 years ago

@mazeau please could you try and tackle this? Ask @nateharms for help as he knows the inner working of reaction center atom labels. What we want is: when a reaction has been created (but before the labels have been wiped) we should save (somehow) a representation of the adjacency lists of the reaction, with the relevant atom labels. We could perhaps immediately output these to a separate text (or yaml) file for each reaction. Or could save the atom labels and create the output later. To do the latter, we may need to hack it a bit like cell [16] in this notebook.

I'd be happy to work on this, but Sandia are getting impatient and it seems I don't have much spare time to actually do it myself.

mazeau commented 5 years ago

Do you want this to be an option in the input file? Similar to generateOutputHTML?

rwest commented 5 years ago

Ooh, Having it something we can turn on and off using the input file is a good idea. But having it just on all the time is a useful start so we can give Eric something

nateharms commented 5 years ago

Are you looking for something like this?

Reaction(reactants=[Molecule(SMILES="CC"), Molecule(SMILES="[O]O")], products=[Molecule(SMILES="C[CH2]"), Molecule(SMILES="OO")]) matches H_Abstraction reaction family. 

Molecule(SMILES="CC.[O]O")
multiplicity 2
1     O u0 p2 c0 {2,S} {11,S}
2  *3 O u1 p2 c0 {1,S}
3  *1 C u0 p0 c0 {4,S} {5,S} {6,S} {7,S}
4     C u0 p0 c0 {3,S} {8,S} {9,S} {10,S}
5  *2 H u0 p0 c0 {3,S}
6     H u0 p0 c0 {3,S}
7     H u0 p0 c0 {3,S}
8     H u0 p0 c0 {4,S}
9     H u0 p0 c0 {4,S}
10    H u0 p0 c0 {4,S}
11    H u0 p0 c0 {1,S}

Molecule(SMILES="OO.[CH2]C")
multiplicity 2
1     O u0 p2 c0 {2,S} {10,S}
2  *1 O u0 p2 c0 {1,S} {11,S}
3     C u0 p0 c0 {4,S} {5,S} {6,S} {7,S}
4  *3 C u1 p0 c0 {3,S} {8,S} {9,S}
5     H u0 p0 c0 {3,S}
6     H u0 p0 c0 {3,S}
7     H u0 p0 c0 {3,S}
8     H u0 p0 c0 {4,S}
9     H u0 p0 c0 {4,S}
10    H u0 p0 c0 {1,S}
11 *2 H u0 p0 c0 {2,S}

rwest commented 5 years ago

Yes! Something like that

nateharms commented 5 years ago

Okay, this chunk of code should do what you need. It'll take in a list of reactions and write the reactant and product complexes to adjacency lists in an adjacency_lists.py file. Let me know if you need any explanation of the code.

import rmgpy
from rmgpy.molecule import Molecule
from rmgpy.species import Species
from rmgpy.reaction import Reaction
from rmgpy.data.rmg import RMGDatabase

########## Change this section for the reaction and families of interest ##########
rxn1 = Reaction(
    reactants = [Molecule(SMILES="CC"), Molecule(SMILES="[O]O")],
    products = [Molecule(SMILES="[CH2]C"), Molecule(SMILES="OO")]
)

rxn2 = Reaction(
    reactants = [Molecule(SMILES="C"), Molecule(SMILES="[O]O")],
    products = [Molecule(SMILES="[CH2]C"), Molecule(SMILES="OO")]
)

rxn3 = Reaction(
    reactants = [
        Species(molecule=[Molecule(SMILES="CC")]), 
        Species(molecule=[Molecule(SMILES="[O]O")])
    ],
    products = [
        Species(molecule=[Molecule(SMILES="[CH2]C")]), 
        Species(molecule=[Molecule(SMILES="OO")])
    ]
)
reaction_list = [rxn1, rxn2, rxn3]

families_of_interest = [
    "H_Abstraction"
]

###################################################################################

rmg_database = RMGDatabase()
database_path = rmgpy.settings['database.directory']
rmg_database.load(
    database_path,
    kineticsFamilies=families_of_interest,
    transportLibraries=[],
    reactionLibraries=[],
    seedMechanisms=[],
    thermoLibraries=[
        'primaryThermoLibrary',
        'thermo_DFT_CCSDTF12_BAC',
        'CBS_QB3_1dHR'],
    solvation=False,
)

families = rmg_database.kinetics.families
write_string = ""
for reaction in reaction_list:
    match = False
    rmg_reactants = []
    rmg_products = []
    for react in reaction.reactants:
        if isinstance(react, Species):
            rmg_reactants.append(react.molecule)
        elif isinstance(react, Molecule):
            rmg_reactants.append([react])

    for prod in reaction.products:
        if isinstance(prod, Species):
            rmg_products.append(prod.molecule)
        elif isinstance(prod, Molecule):
            rmg_products.append([prod])

    test_reactants = []
    test_products = []
    if len(rmg_reactants) == 1:
        test_reactants = rmg_reactants
    elif len(rmg_reactants) == 2:
        l1, l2 = rmg_reactants
        for i in l1:
            for j in l2:
                test_reactants.append([i, j])
    elif len(rmg_products) == 3:
        l1, l2, l3 = rmg_reactants
        for i in l1:
            for j in l2:
                for k in l3:
                    test_reactants.append([i,j,k])

    if len(rmg_products) == 1:
        test_reactants = rmg_products
    elif len(rmg_products) == 2:
        l1, l2 = rmg_products
        for i in l1:
            for j in l2:
                test_products.append([i, j])
    elif len(rmg_products) == 3:
        l1, l2, l3 = rmg_products
        for i in l1:
            for j in l2:
                for k in l3:
                    test_products.append([i,j,k])
    for name, family in families.items():
        if match:
            continue

        labeled_reactants, labeled_products = (None, None)
        for test_reactant in test_reactants:
            for test_products in test_products:
                if (labeled_reactants and labeled_products):
                    continue
                labeled_reactants, labeled_products = family.getLabeledReactantsAndProducts(
                    reactants = test_reactant,
                    products = test_products
                )

        if (labeled_reactants) and (labeled_products):
            print "{} matches {} reaction family.".format(reaction.__repr__(), name)
            print
            reactant_complex = Molecule()
            for reactant in labeled_reactants:
                reactant_complex = reactant_complex.merge(reactant)
            reactant_complex.updateMultiplicity()
            write_string += reactant_complex.__repr__()
            write_string += "\n"
            write_string += reactant_complex.toAdjacencyList()
            write_string += "\n"

            product_complex = Molecule()
            for product in labeled_products:
                product_complex = product_complex.merge(product)
            product_complex.updateMultiplicity()
            write_string += product_complex.__repr__()
            write_string += "\n"
            write_string += product_complex.toAdjacencyList()
            write_string += "\n"
            match = True
    if not match:
        print "WARNING: Could not match {} to any provided reaction family...".format(reaction.__repr__())
        print

f = open("adjacency_lists.py", "w")
f.write(write_string)
f.close()

ehermes commented 5 years ago

Right now I don't have the ability to run RMG-cat, so I can't test the above code. I just wanted an example of what the reaction specification would look like, so I can write a parser that will turn it into 3D structures.

Remember that the purpose of this exercise is to begin automating the translation of RMG-cat reactions into 3D structures of the saddle point. With this in mind, I have a couple of comments regarding the example you provided:

I see that your atom indices are moving around -- e.g. the C atom participating in the reaction moves from index 3 to index 4, and the H atom moves from index 5 to index 11. The labels *1 *2 *3 will help me identify which atoms are participating in the reaction, but if the other atom indices are also being changed from reactants to products, it will be much more difficult for me to find a 3D structure for the saddle point. Would it be possible to reorder the atom indices so they correspond 1:1 in the reactants and products? I feel this would be easier to accomplish with molecular graphs than 3D structures, and the only work I've done with molecular graphs is to translate them into 3D structures...

Related to this first point, what about more complicated reactions where multiple bonds are being made and broken? Based on the example you gave, I understand that the bond between *1 and *2 in the reactants (*3 and *2 in the products) is being broken and a bond between *1 and *2 in the products (*3 and *2 in the reactants) is being made. It is not clear to me how this would extend to more complex reactions. I think it is important that we settle on a notation that is general enough to extend to more complicated reactions.

The reaction that I am initially going to be testing is a simple catalytic dehydrogenation reaction: CH3* + * -> CH2* + H*. Given the notation you provided, would this be an appropriate way to specify this reaction?

CH3
multiplicity ??? # <- I don't think this is meaningful for catalytic systems
1  *1 C u0 p0 c0 {2,S} {3,S} {4,S} {5,S}
2     H u0 p0 c0 {1,S}
3     H u0 p0 c0 {1,S}
4  *2 H u0 p0 c0 {1,S}
5     X u0 p0 c0 {1,S}
6  *3 X u1 p0 c0

CH2+H
multiplicity ???
1  *3 C u1 p0 c0 {2,S} {3,S} {5,S}
2     H u0 p0 c0 {1,S}
3     H u0 p0 c0 {1,S}
4  *2 H u0 p0 c0 {6,S}
5     X u0 p0 c0 {1,S}
6  *1 X u0 p0 c0 {4,S}

Thanks for the work on this issue, and apologies for the long comment.

Edit: Corrected the label numbers in my paragraph about more complex reaction types.

mazeau commented 5 years ago

Note to self: in data/kinetics/family.py at the end of applyRecipe adding these two lines

        print reactantStructure.toAdjacencyList(label='reactant')
        print productStructure.toAdjacencyList(label='product')
        return productStructures

prints what we need. (It just prints it far too often and for everything, without us knowing which is which. But it's a start)

mazeau commented 5 years ago

Now it will print reactant adj lists and prod and reactant adj lists (or just the combined reactant adj list if it is the same as the product one), as well as some will print the reaction itself. It's just far too large to keep all of these until the end of model generation.

label.txt

rwest commented 5 years ago

These changes on a branch somewhere?

mazeau commented 5 years ago

https://github.com/mazeau/RMG-Py/tree/cat-label

rwest commented 5 years ago

Progress from a different approach. I think what's currently on my catlabel branch: https://github.com/rwest/rmg-py/tree/catlabel (currently commit 86e9f63ee1a761392c87c96250adf8a33afe4690) manages to save the labeled adjacency lists for the reactants complex and products complex, without reordering, onto every TemplateReaction object.

What's needed next is to find the most appropriate place to print them out.

Could be in an i/o thing that's done for all reactions each iteration, like how we print the chemkin and html files. Or it could be done just once per reaction when a reaction gets added to the core (or even edge). eg. makeNewReaction (rmg/model.py line 486)

mazeau commented 5 years ago

I merged your changes and now have all the reactions print out to a text file

https://github.com/mazeau/RMG-Py/commit/fb0d7bf9bfc39ae5909e900aedbda09f5965f8c1

rwest commented 5 years ago

Thanks. I took your (relevant) changes to the catlabel branch. Looks like it's working.

Though I just noticed one small problem... the reactant and product adjacancy lists are always the same!

============================================================================
28
vacantX(3) + O=C[Ni](35) <=> HX(5) + OCX(14)
Surface_Dissociation
reactant
multiplicity -187
1 *2 H u0 p0 c0 {4,S}
2 *4 X u0 p0 c0
3    O u0 p2 c0 {4,D}
4 *1 C u0 p0 c0 {1,S} {3,D} {5,S}
5 *3 X u0 p0 c0 {4,S}

product
multiplicity -187
1 *2 H u0 p0 c0 {4,S}
2 *4 X u0 p0 c0
3    O u0 p2 c0 {4,D}
4 *1 C u0 p0 c0 {1,S} {3,D} {5,S}
5 *3 X u0 p0 c0 {4,S}

============================================================================
29
O=C[Ni](35) + CHX(17) <=> OCX(14) + CH2X(16)
Surface_Abstraction
reactant
multiplicity -187
1 *3 C u0 p0 c0 {3,S} {4,T}
2 *4 H u0 p0 c0 {6,S}
3    H u0 p0 c0 {1,S}
4 *5 X u0 p0 c0 {1,T}
5    O u0 p2 c0 {6,D}
6 *1 C u0 p0 c0 {2,S} {5,D} {7,S}
7 *2 X u0 p0 c0 {6,S}

product
multiplicity -187
1 *3 C u0 p0 c0 {3,S} {4,T}
2 *4 H u0 p0 c0 {6,S}
3    H u0 p0 c0 {1,S}
4 *5 X u0 p0 c0 {1,T}
5    O u0 p2 c0 {6,D}
6 *1 C u0 p0 c0 {2,S} {5,D} {7,S}
7 *2 X u0 p0 c0 {6,S}

============================================================================

rwest commented 5 years ago

Think this is now funcitoning. @ehermes does this look like it should be useful format?

Any tweaks to make it easier on downstream processing? should be easy enough to apply a template to make it yaml or something.

25
vacantX(3) + HOCOX(37) <=> HOX(8) + OCX(14)
Surface_Dissociation
reactant
multiplicity -187
1 *2 O u0 p2 c0 {2,S} {3,S}
2    H u0 p0 c0 {1,S}
3 *4 X u0 p0 c0 {1,S}
4    O u0 p2 c0 {5,D}
5 *1 C u0 p0 c0 {4,D} {6,D}
6 *3 X u0 p0 c0 {5,D}

product
multiplicity -187
1 *2 O u0 p2 c0 {2,S} {5,S}
2    H u0 p0 c0 {1,S}
3 *4 X u0 p0 c0
4    O u0 p2 c0 {5,D}
5 *1 C u0 p0 c0 {1,S} {4,D} {6,S}
6 *3 X u0 p0 c0 {5,S}

============================================================================
26
CH2X(16) + O=C[Ni](35) <=> CH3X(7) + OCX(14)
Surface_Abstraction
reactant
multiplicity -187
1 *3 C u0 p0 c0 {2,S} {3,S} {4,S} {5,S}
2 *4 H u0 p0 c0 {1,S}
3    H u0 p0 c0 {1,S}
4    H u0 p0 c0 {1,S}
5 *5 X u0 p0 c0 {1,S}
6    O u0 p2 c0 {7,D}
7 *1 C u0 p0 c0 {6,D} {8,D}
8 *2 X u0 p0 c0 {7,D}

product
multiplicity -187
1 *3 C u0 p0 c0 {3,S} {4,S} {5,D}
2 *4 H u0 p0 c0 {7,S}
3    H u0 p0 c0 {1,S}
4    H u0 p0 c0 {1,S}
5 *5 X u0 p0 c0 {1,D}
6    O u0 p2 c0 {7,D}
7 *1 C u0 p0 c0 {2,S} {6,D} {8,S}
8 *2 X u0 p0 c0 {7,S}

============================================================================

Each reaction family (eg Surface_Abstraction) has a recipe that defines the 1, 2, etc. labels in terms of bonds being maken, broken, etc, eg lines 21-26 of this example

rwest commented 5 years ago

OK, it's now yaml.

  - index: 6
    reaction: HOX(8) + CH2X(16) <=> OX(6) + CH3X(7)
    reaction_family: Surface_Abstraction
    reactant: |
        multiplicity -187
        1 *3 C u0 p0 c0 {2,S} {3,S} {4,S} {5,S}
        2 *4 H u0 p0 c0 {1,S}
        3    H u0 p0 c0 {1,S}
        4    H u0 p0 c0 {1,S}
        5 *5 X u0 p0 c0 {1,S}
        6 *1 O u0 p2 c0 {7,D}
        7 *2 X u0 p0 c0 {6,D}
    product: |
        multiplicity -187
        1 *3 C u0 p0 c0 {3,S} {4,S} {5,D}
        2 *4 H u0 p0 c0 {6,S}
        3    H u0 p0 c0 {1,S}
        4    H u0 p0 c0 {1,S}
        5 *5 X u0 p0 c0 {1,D}
        6 *1 O u0 p2 c0 {2,S} {7,S}
        7 *2 X u0 p0 c0 {6,S}

adjacency_lists.txt

cfgoldsmith / RMG-Py

Output labeled adjacency lists for each reaction #72