UCLOrengoGroup / cath-tools

Protein structure comparison tools such as SSAP and SNAP
http://cath-tools.readthedocs.io
GNU General Public License v3.0
57 stars 14 forks source link

Strategy for domain-based superposition of full PDB structures (with ligands) #21

Closed sillitoe closed 7 years ago

sillitoe commented 7 years ago

@toluadeyelu had a query that I said I would add here for future documentation.

He has generated a multiple structure superposition of CATH domains.

He would like to add ligands and binding sites back into the structure.

There may be a more elegant solution in the pipeline (e.g. #3), however my suggested approach in the meantime was something like the following:

Does that sound reasonable?

(edit: removed the mentions of 'foreach' for clarity)

tonyelewis commented 7 years ago

Yes - that's what I'd do.

The only thing is I'm not 100% sure what the "for each"s in the bullet points mean. Once you've run cath-superpose-multi-temp-script, you should only need to run two cath-superpose commands...

First, run the command you're already running but output JSON via --sup-to-json-file, eg:

cath-superpose --ssap-scores-infile my_ssap_scores --pdb-infile $PDBDIR/1aldA00 --pdb-infile $PDBDIR/1b57A00 --pdb-infile $PDBDIR/1fq0A00 --pdb-infile $PDBDIR/1ok4A00 --sup-to-json-file my_sup.json

Second, run a similar command but:

cath-superpose --json-sup-infile    my_sup.json    --pdb-infile $PDBDIR/1ald    --pdb-infile $PDBDIR/1b57    --pdb-infile $PDBDIR/1fq0    --pdb-infile $PDBDIR/1ok4    --sup-to-pymol-file my_sup.pml

Let me know how you get on. Shout if you get stuck.

toluadeyelu commented 7 years ago

@tonyelewis, @sillitoe - Thank you so much but I got this error while trying to parse the file "Error in parsing program options (from the command line): unrecognised option '--sup-to-json-file'"? What can I do about this? Thank you.

tonyelewis commented 7 years ago

It sounds like you're running an old version of cath-superpose. Check that by running cath-superpose --version. You'll need v0.12.15 to read in JSON files.

To get the latest, download it from here and remember to make it executable (chmod +x cath-superpose). If you're running on CentOS 6/7 rather than Ubuntu, use /opt/local/apps/CentOS6-x86_64/bin/cath-superpose instead.

tonyelewis commented 7 years ago

Any joy?

toluadeyelu commented 7 years ago

I guess I cannot currently write to .json file as the i got an output stating "Whilst converting a superposition_context to JSON, its alignment will be ignored because that is not currently supported". I hope the joy will come soon if this can be sorted.

tonyelewis commented 7 years ago

That should just be a warning message, which is indicated by [cath-superpose|warning], eg:

2016-11-23 13:22:53.144267 [cath-superpose|warning] Whilst converting a superposition_context to JSON, its alignment will be ignored because that is not currently supported

Please check: is the superposition JSON file there?

toluadeyelu commented 7 years ago

Yes I have it in my folder.

tonyelewis commented 7 years ago

Great. Then you can try using that file in the second command (ie cath-superpose --json-sup-infile [...] above). Let me know how you get on.

toluadeyelu commented 7 years ago

Am running into an error. I think it comes down to the --ssap-scores-infile option

Using the latest binary:

$ ./cath-superpose --version
============
cath-superpose v0.12.15-0-gc6003f7 [2016-11-18]
============

Superpose protein structures using an existing alignment

Build
-----
   Nov 18 2016 18:43:53
   Clang version 3.6.2 (branches/release_36)
   GNU libstdc++ version 20160726
   Boost 1_57

SSAP scores for 1jd0B00 / 1kopA00 exist in the scores:

$ grep 1jd0B00 superpositions/ssap_scores.fb5c4352b69d0f36976e8114e2653da7 | grep 1kopA00
1jd0B00  1kopA00  259  223  87.41  214   82   30   1.47

Running superpose generates exception:

$ ./cath-superpose --ssap-scores-infile ./superpositions/ssap_scores.fb5c4352b69d0f36976e8114e2653da7 --pdb-infile 1jd0B00 --pdb-infile 1kopA00 --sup-to-pymol-file tmp.pml
Whilst running program ./cath-superpose (via a program_exception_wrapper with typeid: "N4cath40cath_superpose_program_exception_wrapperE"), caught a std::exception:
vector::reserve

Note this works with the pairwise alignment:

$ ./cath-superpose --pdb-infile 1jd0B00 --pdb-infile 1kopA00 --ssap-aln-infile superpositions/1jd0B001kopA00.list  --sup-to-pymol-file tmp.pml
Standard RMSD is : 1.47086
Superposed using select_best_score_percent[70].ca_atoms and actual full RMSD is : 1.48231

Any ideas?

tonyelewis commented 7 years ago

I think your problem is...

For now, the --ssap-scores-infile option is brittle: the list of PDBs that you specify with --pdb-infile must exactly correspond to the list of IDs in the scores file and must appear in the same order that the IDs first appear in this file.

Looked at another way: cath-superpose-multi-temp-script makes it easy for you by providing a command with the correct --ssap-scores-infile and --pdb-file options; don't change them. If you want to superpose a subset, just run another cath-superpose-multi-temp-script to generate a new scores file and get a new command. If you run that in the same temporary directory as before, it'll re-use your existing SSAP results so should be really quick.

In some ways the fact that cath-superpose isn't mapping between the scores file's IDs and the filenames is a bit rubbish but then that's what's giving us the flexibility for you to substitute in completely different PDB files to superpose whole PDBs.

The error message you got is very unhelpful - I'll have a look at improving that.

sillitoe commented 7 years ago

Thanks.

We also managed to get a segmentation fault - but having more difficulty reproducing that one.

toluadeyelu commented 7 years ago

Seems to work but it has a problem with the colouring scheme without an alignment?

$ ./cath-superpose --json-sup-infile    aceta.json    --pdb-infile $PDBDIR/1keq    --pdb-infile $PDBDIR/1v9e    --pdb-infile $PDBDIR/3iai    --pdb-infile $PDBDIR/3ks3   --pdb-infile $PDBDIR/3ml5 --pdb-infile $PDBDIR/4k13     --sup-to-pymol-file aceta.pml
2016-11-24 18:01:06.667486 [cath-superpose|warning] Unable to apply a alignment-based coluring scheme to the superposition because it doesn't contain an alignment

Is it possible to turn off the coloring?

tonyelewis commented 7 years ago

@sillitoe : OK - please do open an issue if you do manage to pin the segfault down. Ta.

@toluadeyelu : Great - so is that now doing what you want (superposing whole PDBs, including any ligands etc, based on domains?)? The thing about colouring is just a warning that you can ignore (which in your case is irrelevant because you haven't requested an alignment-based colouring, but I'd rather spend time on adding the alignment than making this warning smarter).

tonyelewis commented 7 years ago

@toluadeyelu : BTW, I've found commands like the following useful for viewing ligands in PyMOL before:

bg_color white;
show_as sticks, hetatm;
colour black, hetatm;
sillitoe commented 7 years ago

Sorry, that was my fault - didn't notice this was only a warning.

Is it worth having a done line at the end of the output to make it really obvious that everything ran okay (and the warnings are just warnings). Or would that screw up other output options?

Or something like:

Writing PyMOL output file 'tmp.pml' ... done
toluadeyelu commented 7 years ago

It seems the ligand is been stripped off after superposition as it does not appear in the pymol output (No hetatm found)

$ grep HET $PDBDIR/3ml5|head
REMARK   3   HETEROGEN ATOMS          : 14                                      
REMARK 290 THE FOLLOWING TRANSFORMATIONS OPERATE ON THE ATOM/HETATM             
HET     ZN  A 263       1                                                       
HET    AZM  A 264      13                                                       
HETNAM      ZN ZINC ION                                                         
HETNAM     AZM 5-ACETAMIDO-1,3,4-THIADIAZOLE-2-SULFONAMIDE                      
HETATM 2101 ZN    ZN A 263       1.215  -0.903  18.853  1.00  6.67          ZN  
HETATM 2102  C1  AZM A 264      -2.546  -2.796  19.609  1.00  9.73           C  
HETATM 2103  C2  AZM A 264      -4.519  -2.424  20.815  1.00 12.71           C  
HETATM 2104  C3  AZM A 264      -6.604  -1.110  20.909  1.00 13.86           C  
$ ./cath-superpose --json-sup-infile    aceta.json    --pdb-infile $PDBDIR/1keq    --pdb-infile $PDBDIR/1v9e    --pdb-infile $PDBDIR/3iai    --pdb-infile $PDBDIR/3ks3   --pdb-infile $PDBDIR/3ml5 --pdb-infile $PDBDIR/4k13     --sup-to-pymol-file aceta.pml
2016-11-24 18:27:17.470788 [cath-superpose|warning] Unable to apply a alignment-based coluring scheme to the superposition because it doesn't contain an alignment
$ ./cath-superpose --json-sup-infile    aceta.json    --pdb-infile $PDBDIR/1keq    --pdb-infile $PDBDIR/1v9e    --pdb-infile $PDBDIR/3iai    --pdb-infile $PDBDIR/3ks3   --pdb-infile $PDBDIR/3ml5 --pdb-infile $PDBDIR/4k13     --sup-to-pymol-file aceta.pml
2016-11-24 18:27:53.058817 [cath-superpose|warning] Unable to apply a alignment-based coluring scheme to the superposition because it doesn't contain an alignment
$ grep HET aceta.pml 
tonyelewis commented 7 years ago

Hmm. Not sure. I can see the value to making it clearer that warnings are only warnings.

My concern about adding something like a trailing done is that it adds noise, which makes the program that little bit more annoying to use, especially when running batches etc, and it also makes genuine warnings/errors that little bit less obvious.

(Of course, the tools currently have too much noise but I hope for that to slowly improve.)

Alternatives could be to make the warnings more clearly warnings somehow. Move the warning before the cath_superpose? Make it upper case? Use colour (eg yellow for warning; red for error)? I think colour is tricky to do portably (and, who knows, we might choose to build on Windows in the future).

Any thoughts?

I'm unsure. I'd like to mull that one for a while.

tonyelewis commented 7 years ago

@toluadeyelu

No hetatm found

Great - that's a really clear issue - thanks very much for highlighting. I know I've looked at this before and generated a superposition including HETATM records but I'm not sure what state the code got left in.

Please can you open this as a separate GitHub issue (to distinguish the bug from this question issue) and include the list of CATH domains you're trying to superpose so I can look into it?

Thanks very much.

sillitoe commented 7 years ago

Not a big deal either way, but I would lean towards solving this by log levels (not sure how the code is currently handling logs).

With increasing verbosity:

  1. trace
  2. debug (e.g. reading from the file system)
  3. info (e.g. writing to the file system)
  4. warn
  5. error

Then it's a case of which level of verbosity you want to use, e.g. default would be displaying info and above. If you don't want output then --quiet will raise the minimum notification bar to only show warn and --qq would only show error (i.e completely quiet unless something has gone horribly wrong). Obviously vice versa with -v.

sillitoe commented 7 years ago

as you say, I think it's slightly unclear just because the string specifying warning is a bit hidden - I saw the timestamp and jumped straight to the end to figure out the problem.

would hesitate to add colour (very nice, but a considerable time sink)

tonyelewis commented 7 years ago

I've improved the error message for not specifying the full list of PDBs for the SSAP scores file in 2a10d5edd3ba0e0708effb61788ac35589948125.

@toluadeyelu I've fixed the HETATM-stripping problem in 6304bc86ff2029d26456e5b722f60f97b0b5e30d. It was just a stupid mistake of using the wrong PDB data in the code for handling the JSON input. I should add a testcase but haven't got time right now.

@sillitoe Yes - I already do logging at various levels (eg here; albeit not rigorously systematically) but (if I interpret your point correctly) I still think the Done is that little bit more annoying, even if users can then look up the usage to find there's an options to silence it.

Of course, it may prove useful later on, particularly if the tools' final destination is to be frequently used for long, multi-part jobs. But less so if it's typically quick and simple. I don't usually want my ls or grep or SSAP to tell me it's finished; I just want it to: do what I ask, alert me to problems ("Permission denied") and stop.

Anyway, for now, since the actual problem we're trying to fix is that the warning-iness of the warnings has been insufficiently clear, I've just added a simple bold to the severity part of the log message (c91df091be10fcf88704a440f04d1547adefc1fb).

tonyelewis commented 7 years ago

Sorry: I should have explicitly said... It works! After the above fix, I was able to use the method we've discussed to superpose the whole PDBs mentioned above on their ...A00 domains and view their ligands in the superposition. Nice!

sillitoe commented 7 years ago

@tonyelewis - sounds great, many thanks for fixing so quickly.

@toluadeyelu - thanks for reporting and chasing up - your efforts have improved our group's software! Great work

sillitoe commented 7 years ago

@toluadeyelu - you will need to download the latest version of the binary, then try the command again.

toluadeyelu commented 7 years ago

@sillitoe @tonyelewis Thank you so much. Would give the feedback once I get it done this morning,

toluadeyelu commented 7 years ago

@tonyelewis Thank you so much. I have used this and it works brilliantly well. Now I am happy. Thanks @sillitoe. In the conversion of the Json to pymol I only used the full PDB for the one with the ligand while I used the domain structure of the others as suggested by @sillitoe .

tonyelewis commented 7 years ago

@toluadeyelu Great - I'm really glad to hear this is working for you now. If you think you'd benefit from not having to use this workaround, feel free to "Add your reaction" → "+1" to the initial comment on issues #1 and/or #3.

Is everyone happy for me to close this issue? Please shout if not.

tonyelewis commented 7 years ago

Ta.