Electrostatics / pdb2pqr

PDB2PQR - determining titration states, adding missing atoms, and assigning charges/radii to biomolecules.
http://www.poissonboltzmann.org/
Other
117 stars 34 forks source link

mmCIF based PQR format #34

Open speleo3 opened 7 years ago

speleo3 commented 7 years ago

This is a feature request. It would be great if pdb2pqr and apbs could replace the PQR format with a modified mmCIF based format which simply adds a charge (e.g. _atom_site.pqr_partial_charge) and a radius column (e.g. _atom_site.pqr_radius).

Advantages over current PQR format:

Example which demonstrates that the change could be very minimal. In fact, this file loads correct into PyMOL as PQR and as CIF format (the latter currently simply ignores the added columns, but it would be trivial to add support for them).

data_pqr
_pqr_header.remarks
;
REMARK   1 PQR file generated by PDB2PQR (Version master)
REMARK   1
REMARK   1 Command line used to generate this file:
REMARK   1 --chain --ff=AMBER alagly.pdb alagly.pqr
REMARK   1
REMARK   1 Forcefield Used: AMBER
REMARK   1
REMARK   5
REMARK   6 Total charge on this protein: 0.0000 e
REMARK   6
;
loop_
_atom_site.group_PDB
_atom_site.id
_atom_site.label_atom_id
_atom_site.label_comp_id
_atom_site.label_asym_id
_atom_site.label_seq_id
_atom_site.Cartn_x
_atom_site.Cartn_y
_atom_site.Cartn_z
_atom_site.pqr_partial_charge
_atom_site.pqr_radius
ATOM      1  N   ALA A   1      -0.677  -1.230  -0.491  0.1414 1.8240
ATOM      2  CA  ALA A   1      -0.001   0.064  -0.491  0.0962 1.9080
ATOM      3  C   ALA A   1       1.499  -0.110  -0.491  0.6163 1.9080
ATOM      4  O   ALA A   1       2.030  -1.227  -0.502 -0.5722 1.6612
ATOM      5  CB  ALA A   1      -0.509   0.856   0.727 -0.0597 1.9080
ATOM      6  H2  ALA A   1      -1.253  -1.311  -1.308  0.1997 0.6000
ATOM      7  H3  ALA A   1       0.003  -1.967  -0.492  0.1997 0.6000
ATOM      8  H   ALA A   1      -1.251  -1.312   0.327  0.1997 0.6000
ATOM      9  HA  ALA A   1      -0.272   0.568  -1.322  0.0889 1.1000
ATOM     10  HB1 ALA A   1       0.003   0.575   1.535  0.0300 1.4870
ATOM     11  HB3 ALA A   1      -0.374   1.830   0.562  0.0300 1.4870
ATOM     12  HB2 ALA A   1      -1.479   0.666   0.858  0.0300 1.4870
ATOM     13  N   GLY A   2       2.250   0.939  -0.479 -0.3821 1.8240
ATOM     14  CA  GLY A   2       3.700   0.771  -0.479 -0.2493 1.9080
ATOM     15  C   GLY A   2       4.400   2.108  -0.463  0.7231 1.9080
ATOM     16  O   GLY A   2       3.775   3.173  -0.453 -0.7855 1.6612
ATOM     17  OXT GLY A   2       5.615   2.369  -0.458 -0.7855 1.6612
ATOM     18  H   GLY A   2       1.784   1.852  -0.470  0.2681 0.6000
ATOM     19  HA2 GLY A   2       3.972   0.236   0.331  0.1056 1.3870
ATOM     20  HA3 GLY A   2       3.974   0.254  -1.301  0.1056 1.3870
sobolevnrm commented 7 years ago

This is a feature we've been contemplating for a while but have delayed because of the slow rate of mmCIF adoption by visualization and simulation programs. However, since you represent one of the major visualization programs, I think we should escalate this in priority.

Thanks!

On Sun, Feb 19, 2017 at 8:26 AM, Thomas Holder notifications@github.com wrote:

This is a feature request. It would be great if pdb2pqr and apbs could replace the PQR format with a modified mmCIF based format which simply adds a charge (e.g. _atom_site.pqr_partial_charge) and a radius column (e.g. _atom_site.pqr_radius).

Advantages over current PQR format:

  • no whitespace and/or column alignment issues
  • every software which support mmCIF format could read those files

Example which demonstrates that the change could be very minimal. In fact, this file loads correct into PyMOL as PQR and as CIF format (the latter currently simply ignores the added columns, but it would be trivial to add support for them).

data_pqr _pqrheader.remarks ; REMARK 1 PQR file generated by PDB2PQR (Version master) REMARK 1 REMARK 1 Command line used to generate this file: REMARK 1 --chain --ff=AMBER alagly.pdb alagly.pqr REMARK 1 REMARK 1 Forcefield Used: AMBER REMARK 1 REMARK 5 REMARK 6 Total charge on this protein: 0.0000 e REMARK 6 ; loop _atom_site.group_PDB _atom_site.id _atom_site.label_atom_id _atom_site.label_comp_id _atom_site.label_asym_id _atom_site.label_seq_id _atom_site.Cartn_x _atom_site.Cartn_y _atom_site.Cartn_z _atom_site.pqr_partial_charge _atom_site.pqr_radius ATOM 1 N ALA A 1 -0.677 -1.230 -0.491 0.1414 1.8240 ATOM 2 CA ALA A 1 -0.001 0.064 -0.491 0.0962 1.9080 ATOM 3 C ALA A 1 1.499 -0.110 -0.491 0.6163 1.9080 ATOM 4 O ALA A 1 2.030 -1.227 -0.502 -0.5722 1.6612 ATOM 5 CB ALA A 1 -0.509 0.856 0.727 -0.0597 1.9080 ATOM 6 H2 ALA A 1 -1.253 -1.311 -1.308 0.1997 0.6000 ATOM 7 H3 ALA A 1 0.003 -1.967 -0.492 0.1997 0.6000 ATOM 8 H ALA A 1 -1.251 -1.312 0.327 0.1997 0.6000 ATOM 9 HA ALA A 1 -0.272 0.568 -1.322 0.0889 1.1000 ATOM 10 HB1 ALA A 1 0.003 0.575 1.535 0.0300 1.4870 ATOM 11 HB3 ALA A 1 -0.374 1.830 0.562 0.0300 1.4870 ATOM 12 HB2 ALA A 1 -1.479 0.666 0.858 0.0300 1.4870 ATOM 13 N GLY A 2 2.250 0.939 -0.479 -0.3821 1.8240 ATOM 14 CA GLY A 2 3.700 0.771 -0.479 -0.2493 1.9080 ATOM 15 C GLY A 2 4.400 2.108 -0.463 0.7231 1.9080 ATOM 16 O GLY A 2 3.775 3.173 -0.453 -0.7855 1.6612 ATOM 17 OXT GLY A 2 5.615 2.369 -0.458 -0.7855 1.6612 ATOM 18 H GLY A 2 1.784 1.852 -0.470 0.2681 0.6000 ATOM 19 HA2 GLY A 2 3.972 0.236 0.331 0.1056 1.3870 ATOM 20 HA3 GLY A 2 3.974 0.254 -1.301 0.1056 1.3870

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Electrostatics/apbs-pdb2pqr/issues/471, or mute the thread https://github.com/notifications/unsubscribe-auth/AB1_2HmIqVTtGl4p4rcK7orUsElohHttks5reG1TgaJpZM4MFgfw .

danny305 commented 4 years ago

Has this feature been implemented yet?

sobolevnrm commented 4 years ago

Hi Danny -

I apologize but it hasn't been implemented. Can you say more about the use cases that need this feature? We haven't gotten many other requests for mmCIF-format support.

Thanks,

Nathan

On Mon, Dec 23, 2019 at 3:37 PM Danny Diaz notifications@github.com wrote:

Has this feature been implemented yet?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Electrostatics/apbs-pdb2pqr/issues/471?email_source=notifications&email_token=AAOX7WG557MGU4E7V54CM4DQ2FDURA5CNFSM4DAWA7YKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHSEAAY#issuecomment-568606723, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOX7WCMOZQSIGOIJMHP6FDQ2FDURANCNFSM4DAWA7YA .

danny305 commented 4 years ago

We use the partial charge and Hydrogen atoms added by pdb2pqr to engineer a data structure to train CNN models. However, we are trying to switch our data engineering infrastructure to mmCIF files instead of pdb files bc they are more generic and pdb is being deprecated. Switching our infrastructure from pdb to mmCIF files is contingent on generating the partial charges and the SASA values for each atom.

sobolevnrm commented 4 years ago

Hi Danny -

We will continue to track this issue but feel that it will be best addressed after we move APBS/PDB2PQR to the cloud (in progress).

Thank you,

Nathan

On Thu, Dec 26, 2019 at 11:15 AM Danny Diaz notifications@github.com wrote:

We use the partial charge and Hydrogen atoms added by pdb2pqr to engineer a data structure to train CNN models. However, we are trying to switch our data engineering infrastructure to mmCIF files instead of pdb files bc they are more generic and pdb is being deprecated. Switching our infrastructure from pdb to mmCIF files is contingent on generating the partial charges and the SASA values for each atom.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Electrostatics/apbs-pdb2pqr/issues/471?email_source=notifications&email_token=AAOX7WBOE3SZBPUHLGRLS6DQ2TKBHA5CNFSM4DAWA7YKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHVYJVQ#issuecomment-569083094, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOX7WGQW6TB27G6VBE4GKTQ2TKBHANCNFSM4DAWA7YA .

danny305 commented 4 years ago

Nathan,

I totally understand! How long do you believe this cloud migration will take?

Danny

sobolevnrm commented 4 years ago

We are hoping to wrap it up in the next month or so. Thanks.

On Tue, Jan 7, 2020 at 12:55 PM Danny Diaz notifications@github.com wrote:

Nathan,

I totally understand! How long do you believe this cloud migration will take?

Danny

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Electrostatics/apbs-pdb2pqr/issues/471?email_source=notifications&email_token=AAOX7WDMCZRSP7637B5SLPLQ4TT23A5CNFSM4DAWA7YKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIKHX6Y#issuecomment-571767803, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOX7WFY57FOVBZ7OAI6MODQ4TT23ANCNFSM4DAWA7YA .

sobolevnrm commented 4 years ago

@danny305 - we're more-or-less moved into the cloud now. Can you please provide a minimal example of the functionality you're hoping for with mmCIF?

danny305 commented 4 years ago

We would like to run your software where we feed in either a pub-redo file or a mmCIF/cif file of a protein and receive back a mmCIF/cif file that has appended the hydrogen atom rows, and a pqr and radius column. Essentially what @speleo3 eloquently summarized.

Additionally, we are having problems obtaining the pqr values for atoms that do not begin with ATOM in the current pdb pipeline. So all atoms that are under HETATM do not get their hydrogen atoms, radius, and pqr value filled in. This includes all ligands and sugars. Could this be addressed in the new mmCIF/cif functionality?

Does this answer your minimal example question?

Danny

sobolevnrm commented 4 years ago

Yes, that works. Thanks for the update.

danny305 commented 4 years ago

Will you guys extend the pqr functionality to non-protein atoms (water, ligands, sugars, chemical modifications, etc)? Or should we prioritize obtaining partial charge for non-protein atoms else where?

I am asking because we are currently trying to address this issue and would like to divert our efforts else where pqr (you guys) could solve this. Please let me know so we can reprioritize our short term development goals.

Danny

sobolevnrm commented 4 years ago

Hi Danny -

This isn't an immediate priority and we'd appreciate any help you can provide.

We think we have protein mmCIF parsing working in the code now but would appreciate it if you could help provide a few test cases that we could use to check the accuracy.

Thanks!

On Sun, Apr 19, 2020 at 1:08 PM Danny Diaz notifications@github.com wrote:

Will you guys extend the pqr functionality to non-protein atoms (water, ligands, sugars, chemical modifications, etc)? Or should we prioritize obtaining partial charge for non-protein atoms else where?

I am asking because we are currently trying to address this issue and would like to divert our efforts else where pqr (you guys) could solve this. Please let me know so we can reprioritize our short term development goals.

Danny

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/Electrostatics/apbs-pdb2pqr/issues/471#issuecomment-616217130, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOX7WCHTGZEFNOXHPJFF4DRNNK45ANCNFSM4DAWA7YA .

jamesmloy commented 4 years ago

Hi @sobolevnrm

I'm working with Danny on the same project, and I'm happy to help with providing some test cases. I've attached a few cif files that were converted from pdb redo files (also included are the source pdb files). The tool used for the conversion is the Gemmi library (found here).

Please let me know if this is what you're looking for. If it is not, I will gladly provide you with what will help.

Thanks, James

sobolevnrm commented 4 years ago

Hi James --

This is great! @intendo -- could you or Mark work with this?

Thanks!

On Fri, May 8, 2020 at 7:46 PM James Loy notifications@github.com wrote:

Hi @sobolevnrm https://github.com/sobolevnrm

I'm working with Danny on the same project, and I'm happy to help with providing some test cases. I've attached https://github.com/Electrostatics/apbs-pdb2pqr/files/4602512/pqr2sasa-tests.tar.gz a few cif files that were converted from pdb redo files (also included are the source pdb files). The tool used for the conversion is the Gemmi library (found here https://gemmi.readthedocs.io/en/latest/index.html).

Please let me know if this is what you're looking for. If it is not, I will gladly provide you with what will help.

Thanks, James

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Electrostatics/apbs-pdb2pqr/issues/471#issuecomment-626093316, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOX7WFAEUBKMD4KUUVUSULRQS7ZVANCNFSM4DAWA7YA .

danny305 commented 3 years ago

Has this been approved/resolved?

Danny

sobolevnrm commented 3 years ago

Yes, thank you.

sobolevnrm commented 3 years ago

Oops... @speleo3 just told me I totally misunderstood this issue and the original intent was to have PDB2PQR produce mmCIF that APBS and other programs could consume.