astroumd / admit

ADMIT: ALMA Data Mining Toolkit
http://admit.astro.umd.edu
Other
5 stars 2 forks source link

ADMIT fails to close Xvfb windows #20

Open alipnicky opened 5 years ago

alipnicky commented 5 years ago

While running ADMIT recipes 1 and/or 2 sequentially on many images, eventually it will start failing with the following error:

2018-11-05 21:51:16 SEVERE  imview::::  failed to find a viewertool...
ERROR : Admit.py : Project run() failed; (<type 'exceptions.Exception'>, Exception('failed to find a viewertool...',), <traceback object at 0x7fedc60dbb90>) : saving state...
SFind2D failed on the CSM map... Continuing on
INFO : AT.py : Setting 'csub' = [0, 0] for LineSegment_AT
INFO : Admit.py : ADMIT run() called [flowcount 1]
INFO : 
INFO : 
INFO :    Executing CubeSum_AT - '' (V1.1.0)
INFO : 
INFO : 
INFO :   Run using the following settings:
INFO :     linesum :  True
INFO :     numsigma :  4.0
INFO :     zoom :  1
INFO :     pad :  5
INFO :     sigma :  99.0
INFO : 
TIMING : CubeSum ADMIT [  1.64900000e+01   1.54145468e+09]
TIMING : CubeSum BEGIN [ 0.  0.]
INFO : CubeSum_AT.py : Using constant sigma = 0.003151
TIMING : CubeSum start  [  1.10000000e-01   1.17056131e-01   1.39737500e+03   2.17832031e+02]
*** Error ***: Output file, /lustre/naasc/sciops/qa2/alipnick/tmp/admit_tests/2018.1.00922.S/science_goal.uid___A001_X1354_Xad/group.uid___A001_X1354_Xae/member.uid___A001_X1354_Xaf/product/member.uid___A001_X1354_Xaf.helms45_sci.spw16.cube.I.pbcor.admit/member.uid___A001_X1354_Xaf.helms45_sci.spw16.cube.csm exists. immoment can not proceed, please
remove it or change the output file name.
ERROR : Admit.py : Project run() failed; (<type 'exceptions.UnboundLocalError'>, UnboundLocalError("local variable 'outia' referenced before assignment",), <traceback object at 0x7fedc5e82cf8>) : saving state...
Traceback (most recent call last):
  File "/home/casa/packages/RHEL6/release/casa-release-5.4.0-68/lib/python2.7/init_welcome.py", line 30, in <module>
    execfile(__candidates[0])
  File "/users/pteuben/admit/admit/test/admit1.py", line 438, in <module>
    a.run()
  File "/users/pteuben/admit/admit/Admit.py", line 797, in run
    self.fm.run()
  File "/users/pteuben/admit/admit/FlowManager.py", line 367, in run
    task.execute(args)
  File "/users/pteuben/admit/admit/AT.py", line 1790, in execute
    self.run()
  File "/users/pteuben/admit/admit/at/CubeSum_AT.py", line 326, in run
    casa.immoments(**args)
  File "/home/casa/packages/RHEL6/release/casa-release-5.4.0-68/lib/python2.7/immoments.py", line 134, in immoments
    result = task_immoments.immoments(imagename, moments, axis, region, box, chans, stokes, mask, includepix, excludepix, outfile, stretch)
  File "/home/casa/packages/RHEL6/release/casa-release-5.4.0-68/lib/python2.7/task_immoments.py", line 143, in immoments
    if outia:
UnboundLocalError: local variable 'outia' referenced before assignment

This is caused by a piling up of Xvfb windows that fail to close after ADMIT finishes with a recipe as running a ps aux | grep -i Xvfb reveals many currently open (but idle) Xvfb processes. Basically, ADMIT opens a new virtual window with each recipe call and does not close it when it is finished. Therefore, eventually all virtual windows are filled with idle processes until no open virtual displays are left. After that point, the above error message will display until the windows are closed manually. As a work-around, I now manually close all Xvfb sessions before executing another recipe through the following bit of code:

#List all processes, search for Xvfb processes then filter only those that are not active and print to a temporary file
child = subprocess.Popen('ps aux | grep Xvfb | grep "Sl "',stdout=subprocess.PIPE,shell=True)
output = child.communicate()[0]
output = output.split('\n')
#Go through the file and kill all sleeping Xvfb processes but avoid the grep call process
for line in output:
    if line: #i.e. if line non-zero length, continue. Necessary since last line is empty in output
        if "grep" not in line:
            subprocess.call('kill '+line.split()[2],shell=True)

This solution still results in some failures since sometimes multiple sessions of ADMIT are running at the same time which results in them closing each other's still active windows.

When performing similar tasks in CASA (i.e. running imview in --nogui mode), a virtual window is created through Xvfb until the task is completed however it is then closed when the task exits. For some reason that isn't obvious to me, when ADMIT calls this task, the window isn't closed when the task completes. Perhaps it is a subtle difference between using the imview "task" versus the imview "tool"?

This happens when using the newest version of ADMIT downloaded from this github or the one installed on the NAASC lustre under ~pteuben/admit.

teuben commented 5 years ago

The casaclean script has an option to remove all existing Xvfb processes. Unclear if this works on a Mac.

teuben commented 5 years ago

There is a xvfb-run command (in ubuntu the package xvfb installs it), that one can run casa scripts as follows: xvfb-run casa -c script.py thus wondering if that way it would guarantee that our zombie would be killed naturally.

alipnicky commented 5 years ago

Hi @teuben, just to note, that is how I run all the ADMIT recipes and how they end up hanging in the first place. Unless you're talking about using xvfb-run for your individual CASA calls?

teuben commented 4 years ago

JAO people were mentioning the -a flag, but I didn't have much luck with that