google-code-export / cing

Automatically exported from code.google.com/p/cing
0 stars 0 forks source link

Debugging of unfinished NMR_REDO dirs for failed entries #333

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Suppose the NMR_REDO protocol for a certain entry fails. At the moment, only 
the refineEntry log files are send from the VM to nmr's log and log2 dirs. The 
unfished project dir ????.cing and the NRG-CING ????.cing.tgz file are kept in 
the VM.

1. Proper debugging of the cause of failure often requires access to the 
specific VM that tried to calculate the entry. The Xplor logs may have to be 
inspected to understand and solve problems.
2. Inspecting logs in VMs costs cloud time and obviously we have to keep the 
VMs running in order to be able to access them...
3. If many entries in the same VM fail, there will not be enough disk space 
left in the VM.

I suggest to tgz the unfinished dirs (e.g. ????.cing.unf.<stamp>.tgz), send 
them to NMR (e.g. to $D/tmp/<bla>/unfinished/), and remove the temporary 
NMR_REDO cing dir and the tgz files from the VM to save VM disk space.

We have to make sure there is enough disk space at nmr:$D. As the tgz files at 
$D/tmp/<bla>/unfinished/ will be manually inspected anyway, it is not much 
effort to periodically remove tgz files and free disk space.

Original issue reported on code.google.com by WGTouw on 28 Nov 2012 at 11:06

GoogleCodeExporter commented 9 years ago
I agree. The decision to not do that is because depending on the stage there 
might not even be a .tgz. I many cases there is one at it will be of great help 
to have it locally.

Original comment by jurge...@gmail.com on 28 Nov 2012 at 1:22

GoogleCodeExporter commented 9 years ago
The issue was fixed by revision 1199. We now create a ????.cing.unf.tgz file in 
the slave if fullAnnealAndRefine fails. The cing dir and old .tgz file are 
removed from the slave to save space, and after we have ended the slave, we 
check if the .unf.tgz file exists and send it to the master, followed by 
removal of the .unf.tgz file from the master. I verified all functionality for 
correct refinements is still OK (added some methods).

Original comment by WGTouw on 6 Dec 2012 at 3:30