Electrostatics / pdb2pqr

PDB2PQR - determining titration states, adding missing atoms, and assigning charges/radii to biomolecules.
http://www.poissonboltzmann.org/
Other
125 stars 34 forks source link

How to handle charge and radii in mmCIF #175

Open intendo opened 3 years ago

intendo commented 3 years ago

We can use CIF files as input to PDB2PQR but how do we handle the atom charge and radii?

Using the mmcif_pdbx package, we can load PDB (atom_site) data from a CIF file using something like the following:

# Example code of how to get the atom_site container from a mmCIF file 
from pdbx.reader import PdbxReader 

@pytest.mark.parametrize("input_cif", ["1kip.cif", "1ffk.cif"], ids=str)
def test_data_file(input_cif):    
    """Test data file input."""    
    input_path = DATA_DIR / Path(input_cif)
    with open(input_path, "rt") as input_file:
        reader = PdbxReader(input_file)
    data_list = []
    reader.read(data_list)
    for item in data_list:
        print(item.get_object("atom_site").print_it())

There are other dictionaries that have radius and charge.

For example, there is the chem_comp_atom.charge (integer) or chem_comp_atom.partial_charge(float) at (https://www.iucr.org/__data/iucr/cifdic_html/2/cif_mm.dic/index.html).

The question might be how to tie the atom_site(s) and the other dictionary sections together using _chem_comp_atom.atom_id to the _atom_site.label_atom_id.

sobolevnrm commented 3 years ago

@speleo3 and @orbeckst -- do you see any use cases where PQR-like information would be useful in mmCIF format? If not, we'll probably treat this as low priority. Thanks!

speleo3 commented 3 years ago

I'd be all for deprecating PQR and only using something mmCIF based instead. The use case would be that we could abandon PQR parsers :-)

That was my original request in https://github.com/Electrostatics/pdb2pqr/issues/34

Such a file could be a 100% valid mmCIF file with added radius and charge columns. I'm not sure though if _chem_comp_atom properties are a good fit, that would require for example different residue names for two HIS with different charge configuration. It would be much easier to add two custom columns to the _atom_site table, and/or propose adding such columns to one of the official dictionaries.

sobolevnrm commented 3 years ago

Oops -- sorry about that! I re-opened the original issue.


From: Thomas Holder notifications@github.com Sent: Monday, January 25, 2021 11:05 PM To: Electrostatics/pdb2pqr pdb2pqr@noreply.github.com Cc: Nathan Baker nathanandrewbaker@outlook.com; Assign assign@noreply.github.com Subject: Re: [Electrostatics/pdb2pqr] How to handle charge and radii in mmCIF (#175)

I'd be all for deprecating PQR and only using something mmCIF based instead. The use case would be that we could abandon PQR parsers :-)

That was my original request in #34https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FElectrostatics%2Fpdb2pqr%2Fissues%2F34&data=04%7C01%7C%7C3b372e00093847dee3ec08d8c1c8d3cc%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637472415581901206%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=CAj5NoiQ4hg1CEuilabDdzM5CPvav7m0FDajpp4rSfA%3D&reserved=0

Such a file could be a 100% valid mmCIF file with added radius and charge columns. I'm not sure though if _chem_comp_atom properties are a good fit, that would require for example different residue names for two HIS with different charge configuration. It would be much easier to add two custom columns to the _atom_site table, and/or propose adding such columns to one of the official dictionaries.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHubhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FElectrostatics%2Fpdb2pqr%2Fissues%2F175%23issuecomment-767348259&data=04%7C01%7C%7C3b372e00093847dee3ec08d8c1c8d3cc%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637472415581901206%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=rP75Z9vVqjYYghVPHlGXihorWkX%2B7nTpuUcd3NruSeU%3D&reserved=0, or unsubscribehttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAAOX7WFCX4AZVVBT4KMZQYLS3ZSVHANCNFSM4WSOMGBQ&data=04%7C01%7C%7C3b372e00093847dee3ec08d8c1c8d3cc%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637472415581911197%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=pNhyBMsRPxcLLvFLodUtIbo3NHW1w%2BD%2F8tYrSjFAeJ8%3D&reserved=0.

intendo commented 3 years ago

@speleo3 I think we could add the custom fields in the _atom_site table but I didn't know if that would create non-standard mmCIF files that could not then be parsed by other mmCIF parsers like https://github.com/biopython/biopython/blob/master/Bio/PDB/MMCIFParser.py

That is why I was wondering if there is another section that could be used to hold the charge and radius that would be accessible to the mmcif_pdbx parser but not break other parsers.

My concern would be that a user would use APBS or PDB2PQR and end up creating a mmCIF output file that would be incompatible with other mmCIF parsers in their chaining/pipeline processing.

danny305 commented 3 years ago

Whats the status of the CIF output file?

sobolevnrm commented 3 years ago

I am working on it as quickly as I can. Would you like to help?

danny305 commented 3 years ago

Yes. I can probably start dedicating some serious time mid next week.

Can y'all catch me up over the next few days on the status, implementation design, and what needs to be done?

sobolevnrm commented 3 years ago

Sure! I have some initial code that I'll post in a few days. The PDB -> CIF translation works well but I was holding off releasing it to get the CIF -> PDB part done. I'll remedy that shortly.

Thanks!

On Tue, Mar 16, 2021 at 9:28 PM Danny Diaz @.***> wrote:

Yes. I can probably start dedicating some serious time mid next week.

Can y'all catch me up over the next few days on the status, implementation design, and what needs to be done?

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/Electrostatics/pdb2pqr/issues/175#issuecomment-800784399, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOX7WHFQAZWUREMORVJ2OLTEAVVLANCNFSM4WSOMGBQ .

danny305 commented 3 years ago

Awesome. Let me know when you post the code for me to begin familiarizing myself.

Ill let you know when I finish up what I am working on and can transition over to this here in the next week or so.

sobolevnrm commented 3 years ago

For clarification, this is "few days" in COVID time: I'm still working on the code. I wrote most of it and then found a better way to do it so...

danny305 commented 3 years ago

I am preparing slides/code for a talk this Friday.

I am also implementing the writing of CIF files functionality in our other library dependency (freesasa).

So quite honestly, sometime next week will probably be more realistic on my end.

Writing a PQR CIF file is the last loose end in our tech stack so I definitely want to hammer this out in the near future.

Glad we are openly communicating our timelines.

danny305 commented 3 years ago

Ready to start contributing. I'm guessing it's the nathan/cif branch?

intendo commented 3 years ago

@speleo3 @sobolevnrm did we ever decide on the two custom field names in the _atom_site table for the charge and radii?

sobolevnrm commented 3 years ago

No, but we should probably address this in https://github.com/Electrostatics/pdb2cif.

@danny305 -- I was going to redirect you over there as well for this thread.

danny305 commented 3 years ago

Why don't we just use Gemmi to convert between the two?

sobolevnrm commented 3 years ago

Let's move this discussion to the other repo. Can you provide description of what Gemmi does over there? Thanks.