cms-gem-daq-project / gem-plotting-tools

Repository for GEM commissioning plotting tools
GNU General Public License v3.0
1 stars 26 forks source link

32-bits unsigned integer TTree branches (`i`) are incorrecly interpreted by `PyROOT` =< 6.18 #244

Open lpetre-ulb opened 5 years ago

lpetre-ulb commented 5 years ago

(This is a follow-up on https://github.com/cms-gem-daq-project/cmsgemos/issues/297)

Brief summary of issue

When analysing scans (SCurves, but possibly others) with non Reed-Muller encoded chipID's, the analysis routine can fail with the following error: OverflowError: unsigned long is less than minimum.

Types of issue

Expected Behavior

SCurves analysis can fail with non Reed-Muller encoded chipID.

Current Behavior

SCurves analysis should perform seamlessly even with non Reed-Muller encoded chipID.

Steps to Reproduce (for bugs)

  1. Launch anaUltraScurve.py on a scan with non Reed-Muller encoded chipID's.
  2. Fails if the chipID MSB is incorrectly readback as 1 for at least one VFAT.

Possible Solution (for bugs)

The root issue comes from the PyROOT type converters, more precisely a hack for the enum types. This is going to be fixed in ROOT 6.19.

Indeed, UInt_t (which is the vfatID branch type) are currently interpreted as signed integers and therefore can represent a negative number if the MSB is set to 1.

Applying this commit on top of ROOT 6.18 (with its dependent commit and a missing include) fixes the issue. Patched RPM's and SRPM can be found here.

Really fixing the issue requires to either deploy a patched version of ROOT or wait for ROOT 6.19. One drawback being that root-numpy is not actively maintained and might not quickly (if at all) support new releases of ROOT.

One hack could be to add 2**32 to negative numbers when the TBranch is supposed to contain 32-bits unsigned integers. This would have to be done everywhere... Another hack could be to store the chipID's as 64-bits unsigned intergers (l) so that the conversion should work as expected.

In the end, I'm not really sure how to best fix the issue. In the meantime, we are working with the patched ROOT version at ULB.

Your Environment

jsturdy commented 5 years ago

One drawback being that root-numpy is not actively maintained and might not quickly (if at all) support new releases of ROOT.

I don't think this matters, as installing root-numpy does a compilation of any necessary dependencies against the installed version of ROOT, so if the base ROOT version has the fix, that's all that should matter.

There was another issue (@bdorney can maybe recall the number) where a proposal was made for a change at the core (e.g., when the read from HW actually happens), and where I suggested implementing the workaround at the highest possible level, which I guess would be in any piece of code that is reading a stored chipID (i.e., not from hardware) and then doing an operation with it (DB lookup). At this level, it is completely safe to mask off the 16 MSBs (properly decoded values will always have only the 16 LSBs set, and bit-flips on non-RM encoded values don't have reliable values anyway, so masking off the MSB or even the 16 MSBs won't break things any more than they already would be)

jsturdy commented 4 years ago

I have a version of root_numpy ready to work with root-6-18, so we can consider trying to fully test the code base against this release. If it really resolves this issue, and doesn't introduce any additional major problems, i think we could make the decision to migrate production machines to 6.18 as the stated supported version (these updated packages are installed on gem904daq02)

lpetre-ulb commented 4 years ago

The bug is still present in the patch version 00 of root-6.18, but fixed starting from 6.18.02.

At ULB, we are successfully using root-6.18.04-1.el7 since weeks (months?). Moreover we exclusively use VFAT3 hybrid v2 with Reed-Muller encoded chipID, so I think this issue can be considered resolved with a newer version of ROOT.

However, I've never tried the gem-light-dqm which is probably the most sensitive piece of software regarding ROOT upgrades...

jsturdy commented 4 years ago

OK, in production the light DQM is running on a dedicated machine, independently of any of the DAQ machines (in my ideal world, our DAQ SW would drop all dependencies on ROOT altogether, but this would be a longer term restructuring...)

To proceed, I think we should consider upgrading one integration machine and then one QC7/8 machine, (downgrading ROOT RPMs is a PITA). Following the success of this, we can migrate the rest of the machines

lpetre-ulb commented 4 years ago

in my ideal world, our DAQ SW would drop all dependencies on ROOT altogether, but this would be a longer term restructuring...

:100:

I guess that if ROOT compatibility is required, uproot would work very well. However, if there is no need for ROOT compatibility, HDF5 is maybe more compatible. And for the analysis and plotting, numpy, scipy and pyplot are probably more than enough.

To proceed, I think we should consider upgrading one integration machine and then one QC7/8 machine, (downgrading ROOT RPMs is a PITA). Following the success of this, we can migrate the rest of the machines

:+1: I don't foresee any compatibility issue, but the sotware broke too many times these last few days...