Obtaining polymer structures for hydroxyacid isomers with given distribution

eefm1 commented 2 months ago

Hi makers from m2p,

Hopefully you can help me with an error I don't understand. I want to polymerize random co-polyesters with a given distribution. In order to do that, I used smiles from their corresponding hydroxyacid monomers and a given distribution. Thus, I model these polymers by monomer 1=A-B and monomer 2=A-C, where A is the diol and B & C are the diacids. Strangely, this gives an error if the length of the diol A is the same for both monomers. See sample code below:

data = pd.DataFrame({'smiles':[ 'OCOC(C1=CC=C(C(O)=O)C=C1)=O.OCCOC(C1=CC=CC(C(O)=O)=C1)=O', #works 'OCCOC(C1=CC=C(C(O)=O)C=C1)=O.OCCOC(C1=CC=CC(C(O)=O)=C1)=O', #ERROR:Ester_ReactionFailed 'OCCCOC(C1=CC=C(C(O)=O)C=C1)=O.OCCOC(C1=CC=CC(C(O)=O)=C1)=O', #works 'OCCCCOC(C1=CC=C(C(O)=O)C=C1)=O.OCCOC(C1=CC=CC(C(O)=O)=C1)=O' #works ], 'distribution':[[1,1],[1,1],[1,1],[1,1]] }) data['monomers'] = data.smiles.apply(lambda s: pm.get_monomers(s)) data = pm.thermoplastic(data,DP=5,mechanism='ester',replicate_structures=1)

If I don't specify the distribution, it works fine. Do you have any idea where things go wrong?

Thank you for your response.

Kind regards, Evelien

jlaw9 commented 2 months ago

Hi Evelien,

I'm not sure I understand how the description of your goal matches your code example, but from the example, I can see that the code is having issues with this monomer OCCOC(C1=CC=C(C(O)=O)C=C1)=O. If you don't give a distribution, then m2p is using only the second monomer OCCOC(C1=CC=CC(C(O)=O)=C1)=O to build the polymer which you can see if you visualize the polymer smiles e.g., Chem.Draw.MolsToGridImage([Chem.MolFromSmiles(s) for s in data.sort_index().smiles_polymer],molsPerRow=3, subImgSize=(250, 100)).
But if you do give a non-zero distribution value for both monomers, the code is forced to use the first monomer which is resulting in a failed polymerization. I'm not sure why the polymerization is failing though. Must be something in the poly_ester function. Try stepping/following through that function if you are able. I'll take a closer look soon.

Jeff

eefm1 commented 2 months ago

with distribution without distribution

Hi Jeff,

Thanks for your swift reply. I am trying to make structures of co-polyesters. For example, in PET a little bit of isophtalic acid (I) is often added. Thus, this PET will have a repeating unit structure of (ET)x-(EI)y. In order to model this, I split the two repeating units into their corresponding hydroxyacids HO-(ET)-COOH and HO-EI)-COOH with distribution [x,y] and polymerize them.

I added the outcomes of my script as attachments (note that here I used para and meta substituted benzene rings). The top one is with the distribution specified and reports 3 polymers and 1 error, the bottom is one without distribution and reports 4 polymers (no errors).

If I don't specify the distribution, I get a mix of para and meta benzene rings. That does not match your statement that if you don't specify a distribution only the 2nd monomer is taken, right? If I do specify a distribution (in this case all [1,1]), I obtain polymers for all cases where the diol length was unequal. I.e. not ethylene glycol-based for the 2nd monomer, but a C3 or C4 diol. However, these cases are very rare in the real-world ;-)

So that's why I was really puzzled. I was expecting it to work in neither case, or in all cases, but not in some. I will try to find some time next week to debug step by step, but if you have any idea where to look, that would be very much appreciated,

jlaw9 commented 2 months ago

Ok thanks for the additional details. I think I found the problem. If you look closely at the second example, only the meta benzene ring is present. Looks like if both monomers have the same length of diol (e.g., both are C2), then that's when it fails. If either side has a mismatch diol (e.g., C2 with C1), then they are fine. Here's some code where example 1 and 3 have the same length diol, and example 2 has a mismatch (C2 with C1 diol):

data = pd.DataFrame({'smiles':[
    'OCCOC(C1=CC=C(C(O)=O)C=C1)=O.OCCOC(C1=CC=CC(C(O)=O)=C1)=O', # original
    'OCCOC(C1=CC=C(C(O)=O)C=C1)=O.OCOC(C1=CC=CC(C(O)=O)=C1)=O', # remove a C from para side
    'OCCCCOC(C1=CC=C(C(O)=O)C=C1)=O.OCCCCOC(C1=CC=CC(C(O)=O)=C1)=O', # add two C's to both sides
    # repeat to show with distribution
    'OCCOC(C1=CC=C(C(O)=O)C=C1)=O.OCCOC(C1=CC=CC(C(O)=O)=C1)=O', # original
    'OCCOC(C1=CC=C(C(O)=O)C=C1)=O.OCOC(C1=CC=CC(C(O)=O)=C1)=O', # remove a C from para side
    'OCCCCOC(C1=CC=C(C(O)=O)C=C1)=O.OCCCCOC(C1=CC=CC(C(O)=O)=C1)=O', # add two C's to both sides
    ],
    'distribution':[[],[],[],[1,1],[1,1],[1,1]]
})
data['monomers'] = data.smiles.apply(lambda s: pm.get_monomers(s))
data = pm.thermoplastic(data,DP=5,mechanism='ester',replicate_structures=1)
Chem.Draw.MolsToGridImage([Chem.MolFromSmiles(s) for s in data.sort_index().smiles_polymer],molsPerRow=3, subImgSize=(300, 200))

jlaw9 commented 2 months ago

Ok I found the issue. pm.get_monomers() was sorting by molecular weight which used a dictionary with the MW as the key. If multiple monomers had the same MW, only one was kept. Thanks for reporting that bug!

jlaw9 commented 1 month ago

A workaround would be to set sort_by_mw=False for example:

data['monomers'] = data.smiles.apply(lambda s: pm.get_monomers(s, sort_by_mw=False))

NREL / m2p

Obtaining polymer structures for hydroxyacid isomers with given distribution #12