crest-lab / crest

CREST - A program for the automated exploration of low-energy molecular chemical space.
https://crest-lab.github.io/crest-docs/
GNU Lesser General Public License v3.0
198 stars 42 forks source link

De\protonation output file #140

Closed Shualdon closed 8 months ago

Shualdon commented 2 years ago

Hi,

The output file for the protonation and deprotonation sites screening is always as an xyz file, regardless of the input file format. Is there a way to have a different file format instead of xyz, like an sdf file which then can be read by RDKit for further analysis?

pprcht commented 2 years ago

Hi,

At the moment there is not, but I will check the implementation. In the meantime you might use OpenBabel to convert single xyz files into sdf or pdb format. The command would be something like obabel -ixyz struc.xyz -osdf -O struc.sdf

The protonation output ensemble you can split into single xyz files with crest -splitfile protonated.xyz, which will generate a directory called SPLIT

bfmilne commented 2 years ago

Hi,

Assuming you have a multi-structure XYZ file you can convert it to individual SDF files with Openbabel this way

obabel -ixyz struc.xyz -osdf -O struc.sdf -m

which will give you a set of SDF files struc1.xyz, struc2.xyz, etc. If you want to separate the file number from the name use e.g.

obabel -ixyz struc.xyz -osdf -O struc_.sdf -m

to get struc_1.xyz, struc_2.xyz, and so on.

But given that the RDKit SDMolSupplier class allows you to read in multi-SDF files this might not be needed.

Cheers, Bruce

On Fri, 23 Sept 2022 at 13:04, Philipp Pracht @.***> wrote:

Hi,

At the moment there is not, but I will check the implementation. In the meantime you might use OpenBabel https://openbabel.org to convert single xyz files into sdf or pdb format. The command would be something like obabel -ixyz struc.xyz -osdf -O struc.sdf

The protonation output ensemble you can split into single xyz files with crest -splitfile protonated.xyz, which will generate a directory called SPLIT

— Reply to this email directly, view it on GitHub https://github.com/crest-lab/crest/issues/140#issuecomment-1256125539, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFL62ZASDP25ACZXU2QLRRTV7WMFRANCNFSM6AAAAAAQSMZAYI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- Prof. Dr. Bruce F. Milne CFisUC Department of Physics University of Coimbra Rua Larga 3004 - 516 Coimbra Portugal

https://orcid.org/0000-0002-5522-4808 https://publons.com/researcher/2905148/bruce-milne/ https://www.linkedin.com/in/brucemilne

Bidh mi ag òl agus tha fios agam air rudan

Shualdon commented 2 years ago

Hi,

Thank for the reply. I do use openbabel right now to convert the xyz files. However openbabel can't convert the charges from xyz, since xyz files don't have charge info. So my protonated molecules have charge 0, and it requires further processing to find the reaction center and change the charge. Also, the bond orders can sometimes be iffy when converting from xyz since it doesn't have any bonding information, and openbabel needs to guess it based on distance - which is not always perfect. Having a native support for other output file formats can eliminate these problems.

pprcht commented 2 years ago

@Shualdon fair enough, I will look into it.

@bfmilne thanks for the multi-structure suggestion, I wasn't aware.

pprcht commented 2 years ago

Try #142 With the keyword -osdf the ensemble should be re-written to sdf format. (Or if the input was an *.sdf itself) The bond orders are read from a GFN calculation. Please tell me if it's working properly

Shualdon commented 2 years ago

That seems to work. Thanks! This will make my work so much easier.... :) On a side note - --splitfile doesn't seem to work on the ensemble sdf output file. I'll be using openbabel for that instead, but I thought I'll let you know.

Shualdon commented 2 years ago

Ok, I spoke too soon. It seems like the output sdf file doesn't put the right charge on the right atom. I've attached an example - crest_sdf_test.zip There's the input file that I protonated, and the protonated output file. As you can see, crest protonated one of the nitrogens (atom 12), but put a 2+ charge on a carbon (atom 1) instead. Crest gave 7 protomers and in all of them carbon 1 was with a 2+ charge, instead of a 1+ charge on the atom that was actually protonated.

pprcht commented 2 years ago

Yeah, I was wondering about that. The charges at the different atoms will have non-integer values with xtb, so what I'm passing at the moment is just the molecular charge. I thought this M CHG keyword was used for that, but apparently this is also to specify charges at specific atoms. How would the correct charge specification for your system look like in the sdf format? Is +2 as a charge correct, i.e., did you start the protonation with a charge of +1? If not, there might be a .CHRG file left in the directory that you want to delete.

Shualdon commented 2 years ago

Ok, so there was a .CHRG file. I deleted it and reran crest. The original molecule was neutral, and the molecular charge should be +1 after protonation, which is it now. However, the sdf output still shows the charge on carbon 1 and not the protonated nitrogen. The M CHG line specify the formal charges on atom, not the molecular charge. The first number is how many charges there are, followed by 2 numbers for each entry - first is atom index and second is the charge. So in this case it should read M CHG 1 14 1. i.e. - there's one charged atom, on atom number 14, and its charge is +1.

pprcht commented 2 years ago

Okay, so using the xtb charges for the SDF file will not lead anywhere. These are far off any non-zero integer charge that would be required for the SDF format and it's not really possible to read from them "where the proton went". I implemented a small workaround for the protonation tool that exploits the fact of the added proton being the last atom in each structure and looks at the WBO in #143 (not merged to master yet), if you want to try that.

Shualdon commented 2 years ago

It look like this works for simple protonations. I was able to get the right charge on the correct atom in a test molecule. Thanks! However, I don't think this would work for when there's a structural change due to the protonation, I.e. ring closing, dissociation, etc. And this method won't work for deprotonations. Another method would be to calculate the sum of bonds for each atom and compare it to some list of neutral sum of bonds for each atom. Single bond is 1, aromatic 1.5, double 2, triple 3. So the sum of bonds for a neutral nitrogen would be 3, carbon would be 4, oxygen 2, etc. But that would require to create a list like that for all atoms, which can be tricky for metals and hypervalent atoms (e.g. S and P can have 2, 4, and 6 acceptable sum of bonds). The sum of bonds can be calculated from the SDF bond block. I think this can also be calculated in post processing using RDKit or openbabel, but then the charge information in the SDF should be removed so they won't be confused with the charges when loading the molecule from the file. Though it's a workaround that might work for my case (I can try working on a python script for that, if it hasn't already been done), it should be implemented in crest regardless.

pprcht commented 2 years ago

That's a bit dodgy for elements without a clear neutral reference BO sum, as you said. But I guess for most HCNO stuff it would work fine. As long as it is "just" for the SDF format I'm okay with that, I'll try it out.

github-actions[bot] commented 8 months ago

This issue had no activity for 6 months. It will be closed in 1 week unless there is some new activity.