cms-gem-daq-project / gem-plotting-tools

Repository for GEM commissioning plotting tools
GNU General Public License v3.0
1 stars 26 forks source link

Feature Request: Get Stack Trace of child processes from ana_scans.py #242

Closed bdorney closed 4 years ago

bdorney commented 5 years ago

Brief summary of issue

There are a few rare cases where ana_scans.py will fail to properly produce output. In each time this has been tracked to problems with the input files (e.g. they are bad).

However the stack trace that is reported is always the python multiprocessing stack trace while the error message is the error message that is caught:

$ ana_scans.py scurve -s 2019.08.09.13.58 --chamberConfig --medium -c
Launching scurve analysis processes, this may take some time, please be patient
Caught <type 'exceptions.ValueError'>: can only convert an array of size 1 to a Python scalar, terminating workers
Traceback (most recent call last):
  File "/opt/cmsgemos/bin/ana_scans.py", line 597, in scurveMultiProcessing
    ).get(7200) # wait at most 2 hours
  File "/usr/lib64/python2.7/multiprocessing/pool.py", line 554, in get
    raise self._value
ValueError: can only convert an array of size 1 to a Python scalar

Here the message of the ValueError is the error that is actually raised (see below). But the stacktrace is the one from the multiprocessing call of map_async.

Types of issue

Expected Behavior

It would be good if the stack trace matched the actual error and not the multiprocessing.

Current Behavior

Described above.

Steps to Reproduce (for bugs)

In the particular example above the "real" stack trace was determined to be:

% anaUltraScurve.py SCurveData.root long -c -d
Traceback (most recent call last):
  File "/afs/cern.ch/user/d/dorney/scratch0/CMS_GEM/CMS_GEM_DAQ/venv/cc7/py2.7/lib/python2.7/site-packages/gempython/scripts/anaUltraScurve.py", line 58, in <module>
    args.func(args,args.infilename,args.calFile,args.GEBtype,filePath,vfatList)
  File "/afs/cern.ch/user/d/dorney/scratch0/CMS_GEM/CMS_GEM_DAQ/venv/cc7/py2.7/lib/python2.7/site-packages/gempython/gemplotting/utils/scurveAlgos.py", line 157, in anaUltraScurve
    nevts = np.asscalar(np.unique(rp.tree2array(scurveTree, branches = [ 'Nev' ] )))[0] # for some reason numpy returns this as a tuple...
  File "/afs/cern.ch/user/d/dorney/scratch0/CMS_GEM/CMS_GEM_DAQ/venv/cc7/py2.7/lib/python2.7/site-packages/numpy/lib/type_check.py", line 482, in asscalar
    return a.item()
ValueError: can only convert an array of size 1 to a Python scalar

Possible Solution (for bugs)

In this particular case a new failure mode was observed; this time not all VFATs for one file had 100 events of triggers. So the fix would be straight forward for this case; but because the multiprocessing stack trace was spit back and not the "actual" of interest it took longer to debug; I think the strategy would be to keep this issue open and add problems+solutions as we go.

Context (for feature requests)

It's hard to debug failures without the relevant stack trace.

Your Environment

lpetre-ulb commented 5 years ago

This issue should be fixed with Python 3: the orignial traceback is now included in the exception : https://bugs.python.org/issue13831

Since Python 2 compatibility is requested while (and after) moving to Python 3, we could add a function wrapper which appends the original traceback to the exception message (see example in the Python issue).

lpetre-ulb commented 4 years ago

QOL debugging feature which won't be fixed in legacy.