UCLOrengoGroup / cath-tools

Protein structure comparison tools such as SSAP and SNAP
http://cath-tools.readthedocs.io
GNU General Public License v3.0
57 stars 14 forks source link

Alignment lines in supn for 4tsvA vs 4tswB are clearly wrong #52

Closed tonyelewis closed 6 years ago

tonyelewis commented 6 years ago

The superposition shows the wrong residues being aligned, whether or not refining is applied.

This may an genuine issue with the alignment or just something in the superposition.

To reproduce (note 4tsw is superseded):

cath-ssap                                                     4tsv              4tsw --align-regions 'D[4tsvA]:A' --align-regions 'D[4tswB]:B'
cath-superpose --ssap-aln-infile 4tsvA4tswB.list --pdb-infile 4tsv --pdb-infile 4tsw --align-regions 'D[4tsvA]:A' --align-regions 'D[4tswB]:B' --grad --align-refining NO
cath-superpose --ssap-aln-infile 4tsvA4tswB.list --pdb-infile 4tsv --pdb-infile 4tsw --align-regions 'D[4tsvA]:A' --align-regions 'D[4tswB]:B' --grad --align-refining LIGHT
cath-superpose --ssap-aln-infile 4tsvA4tswB.list --pdb-infile 4tsv --pdb-infile 4tsw --align-regions 'D[4tsvA]:A' --align-regions 'D[4tswB]:B' --grad --align-refining HEAVY

This may be to do with the input PDBs having conflicting entries for the same residue name. Or it may be related to #51 (residues without complete backbones).

Before closing the issue, ensure the problem is fixed in other forms of alignment output (if appropriate).

tonyelewis commented 6 years ago

A little bit more diagnosis (and updating the title accordingly)...

The alignment and superposition are correct; only the alignment lines in the superposition are incorrect.

Inspecting the superposition in PyMOL, it's clear that the residues should be aligned against their namesakes (eg 101 aligned against 101 etc).

The alignment lines are correct up to and including residues 102 but then appear to get incorrectly offset by one residue:

distance alignment, /"4tsvA"//A/102/CA, /"4tswB"//B/102/CA
distance alignment, /"4tsvA"//A/103/CA, /"4tswB"//B/103/CA
distance alignment, /"4tsvA"//A/105/CA, /"4tswB"//B/103/CA
distance alignment, /"4tsvA"//A/106/CA, /"4tswB"//B/105/CA
distance alignment, /"4tsvA"//A/107/CA, /"4tswB"//B/106/CA

It looks as though this is caused by two different residues in 4tswB having the name 103:

ATOM   2239 1HE2 GLN B 102      36.066  16.734  30.275  1.00  0.00           H  
ATOM   2240 2HE2 GLN B 102      34.695  15.859  29.646  1.00  0.00           H  
ATOM   2241  N   ARG B 103      30.036  20.481  30.582  1.00 67.41           N  
ATOM   2242  CA  ARG B 103      28.981  21.363  30.110  1.00 69.06           C  
ATOM   2243  C   ARG B 103      28.668  22.406  31.174  1.00 68.80           C  
ATOM   2244  O   ARG B 103      29.483  22.638  32.071  1.00 69.14           O  
ATOM   2245  CB  ARG B 103      27.725  20.557  29.768  1.00 71.39           C  
ATOM   2246  H   ARG B 103      29.950  20.114  31.491  1.00  0.00           H  
ATOM   2247  N   GLU B 103      27.492  23.017  31.036  1.00 69.00           N  
ATOM   2248  CA  GLU B 103      26.946  24.055  31.918  1.00 69.65           C  
ATOM   2249  C   GLU B 103      26.058  24.961  31.066  1.00 71.54           C  
ATOM   2250  O   GLU B 103      26.334  25.177  29.877  1.00 72.21           O  
ATOM   2251  CB  GLU B 103      28.051  24.882  32.590  1.00 67.34           C  
ATOM   2252  H   GLU B 103      26.896  22.772  30.305  1.00  0.00           H  
ATOM   2253  N   THR B 105      24.903  25.407  31.312  1.00 71.90           N  
ATOM   2254  CA  THR B 105      24.124  26.292  30.435  1.00 69.49           C