JudeWells / chainsaw

MIT License
27 stars 2 forks source link

Output chopping by residue labels (as well as sequential numbering) #27

Closed sillitoe closed 10 months ago

sillitoe commented 10 months ago

We have started using chainsaw to identify domain boundaries in PDB files.

The residue labels in AlphaFold structures are numeric and sequential (1-n). The residue labels in PDB files are much less predictable. They do not always start at 1, they can skip numbers, can include negative numbers, can have optional "insert code" characters, etc, etc.

At the moment, chainsaw outputs chopping boundaries that correspond to the sequential numbering of the aminoacid sequence in the PDB file, ie it completely ignores the residue labels in the PDB file. This can make it problematic when trying to map domain boundaries back onto the structural data.

However, when parsing the PDB file, Chainsaw will have access to both the residue labels AND the sequential numbering. So it would be relatively easy to output the same chopping in two forms: numeric and PDB labels.