ACRMGroup / bioplib

The bioplib library
http://www.bioinf.org.uk/software/bioplib/
Other
12 stars 8 forks source link

PDBML header reading needs to include SEQRES records #7

Closed AndrewCRMartin closed 9 years ago

AndrewCRMartin commented 9 years ago

The PDBML reader should be picking up the equivalent of SEQRES and reconstructing the PDB style SEQRES records in the WPDB->header stringlist This is breaking pdb2pir -s

CraigPorter commented 9 years ago

The PDBML parser is picking-up sequence information and constructing SEQRES records for the WHOLEPDB header.

The -s and -c options for pdb2pir are working. (See 1CTP for an example where coordinate data is absent for the first few residues in the SEQRES records.)

AndrewCRMartin commented 9 years ago

Brilliant! Less important question... Does it write back the PDBML format version of SEQRES properly? A.

CraigPorter commented 9 years ago

It doesn't write back pdbml format. :(

But it's a question that I've been thinking about.

The pdbml sequence records have more information than the pdb seqres records: entity_id (the mol_id from the compnd record), sequence_numbers and an insertion code. So, storing the sequence as pdb-format seqres records then adding data back to write pdbml seems to be impractical.

WHOLEPDB handles coordinate data very well. I think the solution for additional data will have to be along the lines of looking at what additional data we need to store (eg compound, species, sequence, modified residues) and adding additional linked lists to WHOLEPDB to handle them. (As I recall, you had an idea along these lines when we last met.)

Once the paper's out of the way, we could look at the pdbml format, decide what data we need to store for lossless reading/writing of pdbml and decide what to add to WHOLEPDB. It could be the next phase of development for BiopLib.

C.

ps I'll have the modres records finished today.