LBC-LNBio / pyKVFinder

pyKVFinder: Python-C parallel KVFinder
https://lbc-lnbio.github.io/pyKVFinder/
GNU General Public License v3.0
19 stars 9 forks source link

[ENH] Support reading mmCIF files in pyKVFinder #107

Open jvsguerra opened 1 month ago

jvsguerra commented 1 month ago

Problem to solve

PDB entries with extended Chemical Component Dictionary (CCD) or PDB IDs will be distributed in PDBx/mmCIF format only, as announced by the wwPDB in collaboration with the PDBx/mmCIF Working Group. PDB entries containing these extended IDs will not be supported by the legacy PDB file format. (see previous announcement)

Reference: https://www.rcsb.org/news/feature/63ff72ccc031758bf1c30ff7

Proposal

To ensure that pyKVFinder can continue to support PDB entries with extended CCD or PDB IDs, we propose adding support for reading mmCIF files in pyKVFinder. Specifically, we propose adding a new function that can read mmCIF files in the same format as the existing read_pdb and read_xyz functions.

By adding support for reading mmCIF files in pyKVFinder, we can ensure that users can continue to use pyKVFinder to analyze PDB entries with extended CCD or PDB IDs, without the need for manual conversion or preprocessing of the data.

Further details

The proposed function for reading mmCIF files will need to be developed in accordance with the PDBx/mmCIF format specifications. We will also need to update the documentation to reflect the new functionality.

jvsguerra commented 1 month ago

In commit c0b501f32f7a304127a8ee41b98371bbf7f5d845 from branch v1.0.0, a reader for mmcif (read_mmcif) is implemented. However, it still requires testing, commenting, documentation, and optimization.