ACRMGroup / bioplib

The bioplib library
http://www.bioinf.org.uk/software/bioplib/
Other
12 stars 8 forks source link

Reading of PDBML isn't mapping header info (species, compound) to chain labels #5

Closed AndrewCRMartin closed 9 years ago

AndrewCRMartin commented 9 years ago

Need to get the chain label information from the PDBML equivalent of seqres to create a mapping from chain labels to entity_id and thence to the COMPND and SPECIES data. Use the pdbheader.c program to test.

CraigPorter commented 9 years ago

The entity_id is a mandatory part of the atom sites (ATOM) data in the pdbml format. When I originally implemented PDBML parsing, I had the entity_id default to 1. We now read/write entity_id for PDBML files.

For PDBML, I parse entity_id from the atom site data and map entity_id to chain labels allowing the addition of chain labels to the COMPND data. The pdbheader.c program now works for pdbml files.

For translating from PDB to PDBML, I map the the chain labels to entity_id using the COMPND records and set the entity_id based on the chain label. We can now read and write between PDB and PDBML while preserving the MOL_ID to CHAIN mapping for COMPND data.

I need to add unit test files.

AndrewCRMartin commented 9 years ago

Excellent! Does that mean this issue can be closed?

CraigPorter commented 9 years ago

I'd like to make sure that ligands (eg metal ions or non-protein antigens) are handled in a sensible manner. I used files from the PDB database for the tests while I wrote the code so I'll need to add smaller files to the unit tests.

CraigPorter commented 9 years ago

I've restricted compound type to "polymer" for PDBML-format files which will prevent metal ions etc appearing in the COMPND records. I've added small (two residues) structure files to the unit tests.