crest-lab / crest

CREST - A program for the automated exploration of low-energy molecular chemical space.
https://crest-lab.github.io/crest-docs/
GNU Lesser General Public License v3.0
198 stars 42 forks source link

Rotamers seen as true conformers #232

Closed Bitumelourd closed 11 months ago

Bitumelourd commented 11 months ago

Hello

On my molecule, I have a PMe3 group. CREST gives me all the methyl rotations as true conformers. For example, molA and molB, found in crest_confomers.xyz after a constraint conformational search with bound 10-1 and 1-29 constraint, have an RSMD of around 1.7 without hydrogens. If I swapped the carbon ordering number on the PMe3 part of molA to match the same atom ordering of PMe3 moeity in molB, I found an RMSD of 0.019 between molAperm and molB. I've also encountered the same problem with the phenyl group in the past.

Is there a way to have them correctly seen as rotamers?

See attached molA, molB and molAperm.

mol.zip

pprcht commented 11 months ago

The sorting algorithm is numerical in nature and you have three main parameters/thresholds that can be tweaked to your needs: An energy threshold (-ethr), a rotational constant threshold (-bthr) and the RMSD threshold (-rthr)

All of these will affect the performance of CREGEN. For the standalone use see https://crest-lab.github.io/crest-docs/page/examples/example_2.html

Bitumelourd commented 11 months ago

Sorry, perhaps I didn't explain the problem clearly.

molA and molB come from the CREST calculation. If we calculate the RSMD between these two structures, I find 1.07, but looking at them, they are very similar. This is due to the wrong order of atoms on the PMe3 fragment. The methyls are permuted and this is mainly the unique difference between molA and molB. If I manually permute the atom order of molA to match that of molB on the PMe3 part only, I find an RMSD of 0.019. In fact, CREST should remove them if the threshold is set to 0.125, but as the RMSD is calculated on poorly ordered structures, it retains them.

I hope I’m more clear but I’m not quite sure.

To solve this problem, I found SpyRMSD, git and paper, which is able to give the correct RMSD of 0.019 between these two structures. I will apply it to compare the structures of crest_conformers.xyz but I think it would be more convenient to do it during the calculation.

Bitumelourd commented 11 months ago

Perhaps it will be clearer this way.

molA and molB are generated by CREST as unique conformers.

RSMD(molA, molB) = 1.07, so it seems correct that CREST should keep them.

If I now swap the atom numbering of molA in the molAperm file, so that molA and molAperm have exactly the same structure but not the same order, and calculate the RSMD, I find.

RMSD(molAperm,molB) = 0.019 so normally CREST should have kept either molA or molB at the end but not both if the threshold is set at 0.125.

SpyRMSD is able to minimize the RMSD and find the right value, 0.019, between molA and molB.

pprcht commented 11 months ago

No, sorry, I disagree with this explanation and with 0.019 being the "correct" RMSD. 0.019 is a heavy atom RMSD minimized upon atom permutation. The identification of rotamers relies on having a large RMSD while the energy and rotational constants are the same, and if I look at molA and molB this should be the case. Falling below the threshold of 0.125 would discard the molecules as identical, but here they are true permutational isomers that can be created through bond rotation (=rotamers). If they are not identified as such, i.e., if they are identified as conformational isomers, then you'll need to tweak the parameters as I mentioned earlier. These checks obviously become less accurate the larger the system gets.

If you permute the carbon atoms and call it a day, this is something else. One could design an alternative duplicate check based on this, but doing the permutations for all structures within an ensemble of a few thousand members will be painfully expensive and you will be missing the distinction between conformers and the rotamers of a given conformer. Furthermore permuting C, but not the bound hydrogen atoms it will mess up the topology and the molecule would be discarded on that basis.

Bitumelourd commented 11 months ago

Sorry, maybe I'm using an incorrect vocabulary.

CREST generates a few thousand conformers, which I then have to refine by DFT. For example, I can't keep molA and molB because they're so identical and will result to the same outcome after DFT. If the tBu carbon part is numbered 1, 2, 3 or 3, 1, 2 or 2, 3, 1 and the structure is the same, I only want to keep one of the structures. Now, if I want to increase the RMSD threshold to more than 1.07, I risk losing more structures. So maybe the best option will be a duplicate check after CREST which will be costly but not as much as DFT.

I think my misunderstanding stems from the fact that I don't understand why it's important to keep the numbering order of the atoms if the group is symmetrical tBu or phenyl, for example. Like on this picture, if the phenyl group is attached by either 1 or 4, why is it important to keep the two different number ordering? Maybe it’s a naïve question but I can’t figure it out.

13321_2020_455_Fig1_HTML

pprcht commented 11 months ago

Read my previous comment again, it's in there.

molA and molB are two rotamers of the same conformer. CREST will keep them both in crest_rotamers.xyz, but only one of them should be written to crest_conformers.xyz. That is, if energy and rotational constants are close enough. You would expect them to be for rotamers. The RMSD is not so much an issue here, it correctly should be high. You will not have to increase the RMSD threshold.

Them being rotamers means you could interconvert the two structures (find a plausible pathway) purely through bond rotations, without ever breaking a bond. You can't do that anymore if you have atoms permuted, i.e., if the bond topology changed. These would formally be true isomers and there will be some high energy barriers associated with it. The topology is tracking information between which atoms X and Y exists a bond, and you would be changing this for the corresponding C-H mappings if you only permute the carbons. Another way of motivating it: CREST's predecessor was developed initially as a companion program for NMR calculations. Different rotamers will give you different NMR signals, and you NEED the rotamers to be able to calculate the averaged observable spectrum. Hence, that's what the sorting is aimed at and why rotamers are even in the name CREST.

Permutation-invariant RMSDs have been around for a long time, but this is a different ansatz to the ensemble sorting issue. You are loosing the isomer/rotamer/conformer information. Permuting does make sense for non-covalent systems or atom clusters where you don't have rotamers in the same sense, but not so much for covalently bound systems. I can see the motivation to implement it for improving conformer (rather than rotamer) sorting, but modifying -ethr and -bthr will likely get you similar results with CREGEN.

ntampellini commented 11 months ago

@Bitumelourd for what I understand you are looking for an ensemble refinement protocol that gets rid of rotamers, only giving you "truly different" conformers for further DFT refinement. I needed to solve the same problem, and since I believed this laid outside of what CREST was conceived for, I ended up implementing a refinement protocol on my python toolbox TSCoDe.

An input like this should allow you to prune the crest_conformers.xyz ensemble of CREST into what you are looking for.

Bitumelourd commented 11 months ago

@pprcht Okay, it's much clearer to me now. Thanks for taking the time.

The misunderstanding comes from my use of the word permutation. It is possible to go from molA to molB by simple rotations. A -120-degree rotation along the axis of atoms 29-35, followed by a 120-degree rotation along the axis of atoms 35-36, should suffice. The energy and rotational constants for these transformations might therefore be too high for CREST to assign them as rotamers. I then try to play with -ethr, -bthr and -rthr and can only keep molA after modification. Unfortunately, looking at crest_ensemble, I still found structures that are too similar for my liking. If I change the parameters to higher values, I lose structures that I want to keep. In my case, as I have several different systems where I want to find the most stable TS, setting the different values each time can be complicated. That's why I'm looking to have just one parameter to set, and permutation-invariant RMSD seems the easiest to set.

@ntampellini Thanks, I'll have a look at that and compare with using SpyRMSD which is fast enough for me, less than a minute to prune a set of a thousand conformers.