crest-lab / crest

CREST - A program for the automated exploration of low-energy molecular chemical space.
https://crest-lab.github.io/crest-docs/
GNU Lesser General Public License v3.0
182 stars 42 forks source link

Strategy to reduce the number of similar conformers #219

Closed moabe84 closed 10 months ago

moabe84 commented 10 months ago

Hi Philipp, I'd like to ask you a question. I was analyzing the final generated conformations for a structure and realized that there are a lot of similar conformers according to the RMSD analysis. I'm wondering if there is any way to minimize the number of similar conformations perhaps through the keywords in the "Structure Crossing (GC)" calculations:

conformer energy window /kcal : 5.00 CN per atom difference cut-off : 0.3000 RMSD threshold : 0.2500 max. # of generated structures : 2500

It would be greatly appreciated if you could have comments on this matter. Thank you very much.

Mostafa

pprcht commented 10 months ago

If you are referring to rotamers instead of true conformers, you can always turn the GC off, entirely. This step will produce mostly rotamers.

If, however, you are talking about true conformers that are similar, this can of course happen. There is no guarantee that low-lying conformations are hugely different. In fact, often it is reasonable to assume many low-lying structures are similar, at least if they belong to the same energy landscape funnel.

Should you want to force a lower number of conformers you could loosen the comparison thresholds (i.e. increase the RMSD threshold in CREGEN with -rthr, and/or increase the energy threshold with -ethr, possibly also the rotational constant threshold with -bthr). Alternatively, a more drastic solution would be to stick the final conformational ensemble (crest_conformers.xyz) into a PCA/k-Means clustering, which will group the most similar structures together. The criteria are a bit arbitrary with this, so I would only recommend it if you truly need to trim down the ensembles by a lot. We have an implementation that could be used with something like

crest struc.xyz --cregen crest_conformers.xyz --cluster 5

which tries to select 5 representative structures. But there are surely some python packages that could do such clustering as well, with appropriate workarounds.

moabe84 commented 10 months ago

Thank you very much. I really appreciate your time and help. They are all good and helpful suggestions. Regarding using the "_crest struc.xyz --cregen crestconformers.xyz --cluster 5" option, does it first cluster all the conformations (in the crest_conformers.xyz file) into 5 groups, based on the RMSD, and then select the one that has the lowest energy from each group?

Mostafa

pprcht commented 10 months ago

The clustering is not based on the RMSD, but yes, the lowest energy structure for each group is returned in the end.

moabe84 commented 10 months ago

Many thanks Philipp.

moabe84 commented 10 months ago

One more question: I'm trying to apply the CREGEN ensemble clustering method to a traj file obtained from MTD simulations. But for some reason, it seems, it only recognizes the first conformer and ignores the rest. I need to say that it works perfectly for the traj from the CREST conf. search calculations.

Here are the part of the MTD traj file (only 5 conformers), ref. structure, and the output file.
mtd_confs.xyz.txt ref.xyz.txt output.txt

Thanks, Mostafa

pprcht commented 10 months ago

It's probably the energy window. Try increasing it with --ewin. The default is 6 kcal/mol

moabe84 commented 10 months ago

Thanks Philipp. You're absolutely right. That was the energy window. In fact, this helped me to realize that the problem was the first conformer in the mtd traj file which was exactly the same as the ref. structure. I removed that and now it works with the default 6 kcal/mol energy window. Great! You have been most helpful. Many thanks.