Memory allocation error

QDelobel commented 2 years ago

Hello,

I'm a doctorate student who tries to parametrize one ligand using your software but I'm currently running into a wall about a problem that seems to be memory allocation : MemoryError: std::bad_alloc

I'm attaching the files generated after the test if you had some time to help me resolve this issue. Poltype-FBP.zip

Thank you for your attention.

misterbrandonwalker commented 2 years ago

Im going to try my best to get it working with psi4 first (if I can get example that fails). Sometimes optking pcm fails for me too. Trying to "loosen convergence criteria" for pcm. Also since some of your systems have no hydrogen im turning pcm off.

Yeah thats pretty close, so just two things "uncomplexedproteinpdbname" is only used for pdb2pqr to determine protonation state (I keep it in seperate example box). Then since you have 4 ligands, you want something like "keyfilenamelist=FBP1.key , FBP2.key , FBP3.key , FBP4.key" and "xyzfilenamelist=FBP1.xyz , FBP2.xyz , FBP3.xyz, FBP4.xyz". I will update that example to have comma separated list (current example only has one item in list).

QDelobel commented 2 years ago

Ok, just a naive question but since my four ligands are the same molecule in each of the monomer in the tetrameric structure, should I only need to copy-paste the same key/xyz or I need to run poltype for each of the four ?

QDelobel commented 2 years ago

I just did a first test for the binding and here is the error message (from my output when i submit) with the files generated if you have an idea of the problem

Test-binding.zip

Should I have to change something inside my pdb file or does it come from somewhere else ?

misterbrandonwalker commented 2 years ago

Okay so the new commit should be more robust with pcm, I loosened convergence tolerance with psi4 in this case and turn off pcm when hydrogens are not in molecule.

For making the tinker xyz, just make sure the inputs look something like below. The error you saw was because the FBP molecule in your pdb is completely unprotonated but your parameterized molecule input FBP1.xyz … is protonated. So poltype is trying to match with smiles strings to PDB but it will fail since the topology is not the same. I have added an error when this occurs now saying “ValueError: ligandsmiles not matching anything in PDB! [H]~[#8]~#15(~[#8]~[H])~[#8]~1)~[#8]~[H])~[#8]~[H])(~[#8])~[#8]~[H]” etc. I also just added the keyword “makexyzonly” if you want to terminate program after making xyz. You can add hydrogens using “builder” in pymol, so for example on the terminal oxygens, make sure the formal charge assigned is 0, then that tells pymol it can add hydrogen (if its -1 then it thinks it cant add hydrogen).

If you want to also include the other molecules (I assume for buffer or something etc). Will need to include them as well in “ligandxyzfilenamelist”. However, if you want to do binding free energy and disappear only the FBP molecules, now also need to specify another keyword “annihilateligandxyzfilenamelist= FBP1.xyz , FBP2.xyz , FBP3.xyz , FBP4.xyz” so the program knows “which ligands” to disappear electrostatics/vdw. If you don’t include those other ligands in the ligandxyzfilenamelist keyword, then when making the tinker XYZ, the program will ignore it (no smiles string detection). Then for waters/ions within I believe 8 angstroms of any ligand those will be included in tinker xyz file as well. There was a few bug fixes, some mine (too many special cases…) and one where babel was generating wrong pdb and converting MET to ALA!! when I remove some of the HETATMS, so replaced with my own parser. Also a few protein residues are missing (in the beginning of PDB), some missing atoms were added as well (not in missing residues though).

complexedproteinpdbname=3srd-ph7,5.pdb binding keyfilenamelist=FBP1.key , FBP2.key , FBP3.key , FBP4.key ligandxyzfilenamelist=FBP1.xyz , FBP2.xyz , FBP3.xyz , FBP4.xyz makexyzonly

QDelobel commented 2 years ago

Alright, thank you so much for all the help, i will look into it a little bit tonight and test everything tomorrow.

QDelobel commented 2 years ago

Ah right now I discovered something, when i did last time the protonation in PyMOL for Poltype Parametrization I did it from the sdf file from one FBP (https://www.rcsb.org/structure/3srd, in instance coordinates) and it gave me a 34 atoms molecules.

But when I did it right now when using my whole protein (obtained on the same link) and selecting each FBP in PyMOL and protonate it with hydrogens I only obtained 32 atoms for each FBP, will it be a problem for the program since there is a difference of 2 hydrogens ?

misterbrandonwalker commented 2 years ago

So the issue is how each program reads or guesses the formal charge of the atom. Im guessing its one of the terminal oxygens that is missing H (one on both sides of phosphate). This happened to me aswell, if in original PDB file, the formal charge is -1, then if you select the whole molecule in pymol and try to add H it wont add to that O atom with -1 formal charge. So one way to fix is to change the input PDB file (charge field). The other way is to use pymol builder to tell it the formal charge is 0, then when you ask pymol to add hydrogens it sees empty valence and can fill with hydrogen for you. Just need to change from whole molecule atom selection to atom selection, then when you click on atom you want to change there is builder button on top right to modify charge etc... I also just fix another bug where babel thinks ions can have bond if they are too close to other atoms...

pren commented 2 years ago

For small molecules, use Marvin (chemaxon) to evaluate pKa and possibility of multiple protonation states.

From: Quentin DELOBELLE @.> Sent: Wednesday, July 27, 2022 2:47 PM To: TinkerTools/poltype2 @.> Cc: Subscribed @.***> Subject: Re: [TinkerTools/poltype2] Memory allocation error (Issue #12)

Ah right now I discovered something that when i did last time the protonation in PyMOL for Poltype Parametrization I did it from the sdf file from one FBP (https://www.rcsb.org/structure/3srd https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.rcsb.org%2Fstructure%2F3srd&data=05%7C01%7C%7Cc20c0b7920ff4c8071f008da7008c478%7C31d7e2a5bdd8414e9e97bea998ebdfe1%7C0%7C0%7C637945480198834788%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Z1CJPwqVmYG%2BMui1%2FzEmFW6fNbOuIo7Ji5KILTLzuR0%3D&reserved=0, in instance coordinates) and it gave me a 34 atoms molecules.

But when I did it right now when using my whole protein (obtained on the same link) and selecting each FBP in PyMOL and protonate it with hydrogens I only obtained 32 atoms for each FBP, will it be a problem for the program since there is a difference of 2 hydrogens ?

- Reply to this email directly, view it on GitHubhttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FTinkerTools%2Fpoltype2%2Fissues%2F12%23issuecomment-1197291580&data=05%7C01%7C%7Cc20c0b7920ff4c8071f008da7008c478%7C31d7e2a5bdd8414e9e97bea998ebdfe1%7C0%7C0%7C637945480198834788%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=MDVQ0TsGGBK3KLj%2FohC3CdLaVfqZoeLNrR9cv20LoxA%3D&reserved=0, or unsubscribehttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABNC6XR3ICBDOMC6VNIC25TVWGG3BANCNFSM52NAMYUQ&data=05%7C01%7C%7Cc20c0b7920ff4c8071f008da7008c478%7C31d7e2a5bdd8414e9e97bea998ebdfe1%7C0%7C0%7C637945480198834788%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=P9WYWK2TG4hGWR8TKAt5Y5HqOn8Pl1mUCKEXRVtW%2FAc%3D&reserved=0. You are receiving this because you are subscribed to this thread.Message ID: @.**@.>>

misterbrandonwalker commented 2 years ago

Alternatively you may just use poltype to print out the relevant states. The only difference is that with the software im wrapping it doesnt give probability estimate, but the 4 states generated are the four dominant states in marvin sketch (well one of them only has ~ 4% chance in relevant pH range according to marvin.., the one with one H on both phosphates). "structure=FBP.xyz genprotstatesonly "

QDelobel commented 2 years ago

I see, on my side i did resolve this difference of two hydrogen with your advices on pymol.

I will test the binding with my pdb tomorrow and will also look at these probability of dominant state, it could be interesting in the case of this ligand since its considered an allosteric regulator on this site on my protein so it could always be good to see the different states of protonation.

QDelobel commented 2 years ago

Hello, it progressed well for the binding but there was some missing residues on my input pdb so i'm installing modeller like said in poltype protein preparation and sent a demand for the license key (don't know how long it takes). Test-binding.zip

And concerning my other small ions to parametrize, it worked now with the new commit

misterbrandonwalker commented 2 years ago

Glad to hear it. If I recall they should license via email right away.

QDelobel commented 2 years ago

I just accessed to modeller and tried it on my cluster with the .ini here pdbcode=3srd numproc=8 maxdisk=200GB maxmem=50GB coresperjob=1 but it returned this error on my output :

And here are all the files generated, if you have time to look.It seemed to have found my structure but found some sort of error on it ? Test-modeller.zip

misterbrandonwalker commented 2 years ago

Thanks for catching that, this represents a more "general" case as to what I cooked up for the previous PDB I tried this on. The two main issues is that you can have repeating residue numbers between different chains (where as sometimes in PDB resnum keeps going across all chains I believe). Another issue is how the "seq" file generated from Modeller behaved a bit differently from what I saw last time, the PBD file has chains A-D but not all sequential but each line in seq file generated from modeller using the crystal pdb as input doesnt correspond to segments of each chain (like I would expect A-segment first line, then like B-segment ,. ...) anyway its not a problem if program can keep track of chain and residue number concurrently, I will come back to this.

misterbrandonwalker commented 2 years ago

Should be fixed now

QDelobel commented 2 years ago

Hello, it finished correclty but is it normal if the pdb generated doesn't have the atoms for the different ligands. For example here is the pdb when downloaded from the website (with missing residues) : 3srd.zip

and here is the files from poltype (this one with all the residues but missing the ligands like FBP) : Test-missing.zip

misterbrandonwalker commented 2 years ago

Glad to hear it, yes Modeller just adds missing residues and minimizes the structure. If you want to add ligands manually afterwords can try pymol or VMD etc...

misterbrandonwalker commented 2 years ago

On second thought, this is a non trivial issue with many ligands and a common issue... so I added a feature to handle this. See input file below. Output file will have _align.pdb in name.

complexedproteinpdbname=pdb3srd.pdb uncomplexedproteinpdbname=3srd_filled.BL00020001_final.pdb

TinkerTools / poltype2

Memory allocation error #12