Closed AndrewCRMartin closed 9 years ago
The entity_id is a mandatory part of the atom sites (ATOM) data in the pdbml format. When I originally implemented PDBML parsing, I had the entity_id default to 1. We now read/write entity_id for PDBML files.
For PDBML, I parse entity_id from the atom site data and map entity_id to chain labels allowing the addition of chain labels to the COMPND data. The pdbheader.c program now works for pdbml files.
For translating from PDB to PDBML, I map the the chain labels to entity_id using the COMPND records and set the entity_id based on the chain label. We can now read and write between PDB and PDBML while preserving the MOL_ID to CHAIN mapping for COMPND data.
I need to add unit test files.
Excellent! Does that mean this issue can be closed?
I'd like to make sure that ligands (eg metal ions or non-protein antigens) are handled in a sensible manner. I used files from the PDB database for the tests while I wrote the code so I'll need to add smaller files to the unit tests.
I've restricted compound type to "polymer" for PDBML-format files which will prevent metal ions etc appearing in the COMPND records. I've added small (two residues) structure files to the unit tests.
Need to get the chain label information from the PDBML equivalent of seqres to create a mapping from chain labels to entity_id and thence to the COMPND and SPECIES data. Use the pdbheader.c program to test.