VuisterLab / cing

Automated Validation of NMR Structures
http://nmr.le.ac.uk
2 stars 4 forks source link

WHAT IF check values incorrect due to erroneous WHAT IF handling #341

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
WHAT IF validation of ensembles is not properly carried out at the moment. 
Currently, a single script for the validation of all models in an ensemble is 
run in a single instance of WHAT IF. As a result, WHAT IF will write out 
*WRONG* check values after the first model.

http://code.google.com/p/cing/source/browse/trunk/cing/python/cing/PluginCode/Wh
atif.py

This issue causes for example the strange packing quality Z-scores in NRG-CING 
and NMR_REDO.

The issue has been confirmed by Gert Vriend. Starting a new WHAT IF instance 
for the validation of each model in an ensemble (and exit WHAT IF thereafter) 
will fix the issue.

Original issue reported on code.google.com by WGTouw on 14 Feb 2013 at 5:49

GoogleCodeExporter commented 9 years ago
Nice, and good catch! I missed that. 

Curious, can you put specific numbers to an example; say 1brv;-) ?

What If (WI for intimi) is pretty resource intensive so I would see no choice 
but to rerun the full batch unless you're willing to commit a happy few hours 
recoding the integration in which case the CPU requirement could be quartered 
or so. Depending on how many nodes used, it should take a cloud no more than a 
few weeks to complete. Best to do when other items need updating/fixing as well.

Original comment by jurge...@gmail.com on 20 Feb 2013 at 9:51

GoogleCodeExporter commented 9 years ago
============================
Example 1brv from NRG-CING:
============================

CING> project.molecule["Whatif"].NQACHK.valueList.average()
(4.364083333333333, 1.061782360063419, 48)

CING> project.molecule["Whatif"].NQACHK.valueList
NTlist(0.264, 3.297, 3.989, 3.85, 5.531, 3.311, 3.904, 5.338, 3.404, 4.515, 
4.044, 3.936, 3.707, 4.259, 4.916, 5.086, 5.199, 3.685, 5.31, 5.093, 3.928, 
4.842, 4.047, 4.703, 2.855, 5.316, 5.33, 4.696, 3.921, 2.588, 5.522, 5.369, 
5.731, 3.989, 4.179, 5.048, 3.994, 5.225, 4.208, 4.412, 4.839, 5.622, 5.319, 
5.737, 2.972, 4.001, 2.606, 5.839)

CING> project.molecule["Whatif"].NQACHK.valueList.min()
0.264

CING> project.molecule["Whatif"].NQACHK.valueList.max()
5.839

# DO_WHATIF.out0 contains all the WHAT IF output for the script
# and was parsed to fill the WHAT IF summary in CING:
# project.molecule["Whatif"].summary
VC> grep "  2nd generation packing quality" DO_WHATIF.out0 
  2nd generation packing quality :   0.264
  2nd generation packing quality :   3.297
  2nd generation packing quality :   3.989
  2nd generation packing quality :   5.531
  2nd generation packing quality :   3.311
...
  2nd generation packing quality :   5.737
  2nd generation packing quality :   2.972
  2nd generation packing quality :   4.001
  2nd generation packing quality :   2.606
  2nd generation packing quality :   5.839

============================
I have modified the Whatif plugin code so
that for each model a new WHAT IF instance
is started. The pdbout.txt output is now
saved as pdbout_MODELNUMBER.txt. At the end
of the WHAT IF runs, all pdbout files are 
concatenated to pdbout.txt, which is then parsed
by CING (e.g. to create the WHAT IF summary). 
The results are shown below. Note that the value
of the first model is unchanged, as was noted
in my previous comment.
============================

CING> project.runWhatif()
CING> project.runWhatif(parseOnly=True)

CING>  project.molecule["Whatif"].NQACHK.valueList.average()
(-0.2732083333333332, 0.5887524480080409, 48)

CING>  project.molecule["Whatif"].NQACHK.valueList
NTlist(0.264, -0.827, -0.24, -0.21, 0.247, -0.58, -0.719, 0.186, -0.05, -0.346, 
-1.096, -0.791, -0.808, -0.58, -0.635, 0.007, -0.016, -0.077, 0.327, 0.27, 
-0.351, -0.292, -0.241, -0.114, -1.606, 0.589, -0.274, -0.192, -0.732, -1.677, 
0.283, 0.417, 0.358, 0.099, -1.062, 0.275, -0.876, -0.397, 0.462, -0.317, 
-0.642, 0.619, -0.256, 0.64, -1.084, -0.271, -1.449, 0.651)

CING>  project.molecule["Whatif"].NQACHK.valueList.min()
-1.677

CING>  project.molecule["Whatif"].NQACHK.valueList.max()
0.651

VC> grep "  2nd generation packing quality" pdbout_*.txt
pdbout_000.txt:  2nd generation packing quality :   0.264
pdbout_001.txt:  2nd generation packing quality :  -0.827
pdbout_002.txt:  2nd generation packing quality :  -0.240
pdbout_003.txt:  2nd generation packing quality :  -0.210
...
pdbout_043.txt:  2nd generation packing quality :   0.640
pdbout_044.txt:  2nd generation packing quality :  -1.084
pdbout_045.txt:  2nd generation packing quality :  -0.271
pdbout_046.txt:  2nd generation packing quality :  -1.449
pdbout_047.txt:  2nd generation packing quality :   0.651

============================
I will commit the code after some tests. 
Note that I will comment out the code that restores the 
WHAT IF results as project.runWhatif(parseOnly=True) will
now expect pdbout_000.txt, pdbout_001.txt etc. to be present.
These files will only be present when the new WHAT IF
plugin is run. It makes no sense to restore the results now
anyway, as they are incorrect. Enable restoring again when new
WHAT IF results are present for all NRG-CING entries.
============================

Original comment by WGTouw on 15 Mar 2013 at 4:56

GoogleCodeExporter commented 9 years ago
Fixed by r1210 (http://code.google.com/p/cing/source/detail?r=1210)

Original comment by WGTouw on 15 Mar 2013 at 5:10

GoogleCodeExporter commented 9 years ago
Regarding the WHAT IF summary, the error seems to have affected only packing 
qualities and inside/outside distributions. As a result, ROG scores are 
unchanged.

Original comment by WGTouw on 18 Mar 2013 at 8:41

GoogleCodeExporter commented 9 years ago

Original comment by WGTouw on 19 Mar 2013 at 10:48