UCLOrengoGroup / cath-tools

Protein structure comparison tools such as SSAP and SNAP
http://cath-tools.readthedocs.io
GNU General Public License v3.0
57 stars 14 forks source link

Some pairs fail under cath-ssap with PDB_DSSP that work with PDB_DSSP_SEC #49

Open tonyelewis opened 6 years ago

tonyelewis commented 6 years ago

Note that these don't work with --prot-src-files of PDB_DSSP either.

Examples where both proteins have ≥ 30 residues :

1dleB02  2hntE00  142   67  39.85   55   38    7  11.26
2hntE00  4lk4A01   67  125  45.83   44   35    1  10.01
1l1jA01  2hntE00  118   67  35.48   59   50    1  14.44
2hntE00  3gdvC01   67  116  49.33   51   43    7  13.62
2hntE00  3tloA02   67   99  27.21   45   45    1  11.12
3wcyA01  4jqiL02   53   76  51.25   21   27    0   9.71
1hdlA00  4sgbI00   55   51  40.26   27   49    3  11.60
1ktkF02  1smoB00   47  110  57.71   46   41    6   7.62
1ktkF02  2dm3A00   47  110  42.51   26   23    0   4.93
1ktkF02  3irzA03   47   99  46.07   42   42    6   6.73
2hntE00  4fvdA01   67   42  52.51   32   47    7  10.35
3tbxB00  3wcyA01   41   53  57.29   26   49    2   8.93
2zuxA01  3tvmE02   88   32  55.38   30   34   12   4.95
1dx5I02  1yukB02   33   31  70.54   18   54   12   5.90
tonyelewis commented 6 years ago

To be more precise... these all work under PDB_DSSP_SEC, eg:

cath-ssap --prot-src-files PDB_DSSP_SEC 1dleB02 2hntE00

...but fail under PDB_DSSP, eg:

cath-ssap --prot-src-files PDB_DSSP     1dleB02 2hntE00

I will change the title accordingly.

tonyelewis commented 6 years ago

I've looked a little bit into 1dx5I02 versus 1yukB02 and found that:

So this could probably fixed by more work to improve the code's ability to replicate the behaviour in prosec / secmake. That should be done with (a) test-cases to demonstrate the specific improvement (b) the use of large-scale comparisons to ensure that specific changes are reducing overall errors, not increasing them.

But it's also worth noting that all the versions are failing to achieve anything in the slow ssap here and that's probably the more significant problem.