cms-gem-daq-project / ctp7_modules

0 stars 13 forks source link

[Bugfix] Fix channel masking in the per-channel Sbit rate scans #165

Closed lpetre-ulb closed 4 years ago

lpetre-ulb commented 4 years ago

This PR fixes an issue in the per-channel Sbit rate vs THR_ARM_DAC scan reported by @mexanick and discussed in a separate email thread.

Here is the relevant quote:

I have been digging the per-channel Sbit rate scan issue and I believe to have found the issue. Indeed, the VFAT channels are not actually masked. For example, these are the VFAT channel masks set during a per-channel Sbit rate scan:

eagle63 > rwc GEM_AMC.OH.OH10.GEB.VFAT0.VFAT_CHANNELS.CHANNEL*.MASK 0x6568019c rw GEM_AMC.OH.OH10.GEB.VFAT0.VFAT_CHANNELS.CHANNEL103.MASK 0x00000000 0x656801a0 rw GEM_AMC.OH.OH10.GEB.VFAT0.VFAT_CHANNELS.CHANNEL104.MASK 0x00000000 0x656801a4 rw GEM_AMC.OH.OH10.GEB.VFAT0.VFAT_CHANNELS.CHANNEL105.MASK 0x00000000 0x656801a8 rw GEM_AMC.OH.OH10.GEB.VFAT0.VFAT_CHANNELS.CHANNEL106.MASK 0x00000000 0x656801ac rw GEM_AMC.OH.OH10.GEB.VFAT0.VFAT_CHANNELS.CHANNEL107.MASK 0x00000000 [...]

i.e. the channels are not masked. In fact, this is even worse. The scan routine corrupts the VFAT per-channel registers:

eagle63 > kw GEM_AMC.OH.OH10.GEB.VFAT0.VFAT_CHANNELS.CHANNEL100 0x65680190 rw GEM_AMC.OH.OH10.GEB.VFAT0.VFAT_CHANNELS.CHANNEL100 0x00000001 0x65680190 rw GEM_AMC.OH.OH10.GEB.VFAT0.VFAT_CHANNELS.CHANNEL100.CALPULSE_ENABLE 0x00000000 0x65680190 rw GEM_AMC.OH.OH10.GEB.VFAT0.VFAT_CHANNELS.CHANNEL100.MASK 0x00000000 0x65680190 rw GEM_AMC.OH.OH10.GEB.VFAT0.VFAT_CHANNELS.CHANNEL100.ZCC_TRIM_POLARITY 0x00000000 0x65680190 rw GEM_AMC.OH.OH10.GEB.VFAT0.VFAT_CHANNELS.CHANNEL100.ZCC_TRIM_AMPLITUDE 0x00000000 0x65680190 rw GEM_AMC.OH.OH10.GEB.VFAT0.VFAT_CHANNELS.CHANNEL100.ARM_TRIM_POLARITY 0x00000000 0x65680190 rw GEM_AMC.OH.OH10.GEB.VFAT0.VFAT_CHANNELS.CHANNEL100.ARM_TRIM_AMPLITUDE 0x00000001

As you have probably understood by now, this is a register mask issue in the software. After a quick and dirty fix, I get the desired behavior:

eagle63 > rwc GEM_AMC.OH.OH10.GEB.VFAT0.VFAT_CHANNELS.CHANNEL*.MASK 0x6568019c rw GEM_AMC.OH.OH10.GEB.VFAT0.VFAT_CHANNELS.CHANNEL103.MASK 0x00000001 0x656801a0 rw GEM_AMC.OH.OH10.GEB.VFAT0.VFAT_CHANNELS.CHANNEL104.MASK 0x00000001 0x656801a4 rw GEM_AMC.OH.OH10.GEB.VFAT0.VFAT_CHANNELS.CHANNEL105.MASK 0x00000001 0x656801a8 rw GEM_AMC.OH.OH10.GEB.VFAT0.VFAT_CHANNELS.CHANNEL106.MASK 0x00000001 0x656801ac rw GEM_AMC.OH.OH10.GEB.VFAT0.VFAT_CHANNELS.CHANNEL107.MASK 0x00000001 [...]

Description

Before this PR, the setSingleChanMask and applyChanMask functions were using the raw register write method writeRawAddress which does perform an unmasked write. Since the mask bit is not the LSB in the per-channel VFAT configuration, the channels were not masked. As a side effect, the per-channel configuration was corrupted (even after the channel mask restoration).

The fix uses the method writeReg which performs a masked write which the appropriate mask (from the LMDB). The setSingleChanMask and applyChanMask signatures had to be changed to use a std::map indexed by std::strings rather than std::uin32_ts. All the calls in the code base have been adapted.

The bugfix targets only the release/v1.1.X branch. Indeed, the behavior is not exception-safe so that a different, and more versatile method, should be used in the feature/remplated-rpc-methods branch.

Types of changes

Motivation and Context

See first paragraphs.

How Has This Been Tested?

The per-channel Sbit rate vs THR_ARM_DAC scan runs seamlessly and shows varying behavior across the VFAT channels (see the screenshots).

Screenshots (if appropriate):

Before the fix all the channels present the same Sbit rate vs THR_ARM_DAC behavior:

canv_Summary_Rate2D_vs_THR_ARM_DAC_GE11-TestStand10

After the fix the channels present varying Sbit rate vs THR_ARM_DAC behaviors:

canv_Summary_Rate2D_vs_THR_ARM_DAC_GE11-TestStand10

Checklist:

jsturdy commented 4 years ago

This seems like it introduces an API change in addition to a bugfix. Is there any way we can factor out the bugfix part alone? Or is the impact of the API change internal only to the(this) module space?

lpetre-ulb commented 4 years ago

Indeed, the API does change. I didn't find any way around it without using an hard-coded mask which would break the LMDB registers abstraction.

If the function could be used in different RPC modules since it is declared in the header, in practice, it is not used outside calibration_routine. It is safe to replace only the calibration_routine module.

mexanick commented 4 years ago

Actually in this particular case hard-coded mask won't really break the LMDB abstraction because the channel registers in SW/FW mirror their hardware implementation in VFAT3 chips, i.e. this mask will never change unless we replace VFAT3 with something else.

lpetre-ulb commented 4 years ago

Well, the abstraction would still be broken. ;-) But I agree it should be safe to do so in this case.

Following that direction, it should be possible to generate the VFAT3 registers algorithmically instead of relying on the LMDB. The LMDB database size would shrink by much and the performances should also improve.

jsturdy commented 4 years ago

Well, the abstraction would still be broken. ;-) But I agree it should be safe to do so in this case.

Following that direction, it should be possible to generate the VFAT3 registers algorithmically instead of relying on the LMDB. The LMDB database size would shrink by much and the performances should also improve.

This has been a longstanding back-burner idea, and once we have a minimally functional suite with the templated framework operational, I'd say it could make sense to pursue it. My primary concern in this particular case was to make it simple to transport this change into the templated branch (since a lot of work was done recently to rebase that on top of all the recent changes in the "legacy" code)

lpetre-ulb commented 4 years ago

This has been a longstanding back-burner idea, and once we have a minimally functional suite with the templated framework operational, I'd say it could make sense to pursue it.

:+1:

My primary concern in this particular case was to make it simple to transport this change into the templated branch (since a lot of work was done recently to rebase that on top of all the recent changes in the "legacy" code)

In any case, this bugfix does not translate well into the templated branch since it is not exception safe at all. Indeed, if an exception is thrown and not caught at the right place within the RPC method, the system state won't be restored. The safe way to do fix this (and any other state restoration) would be to restore the system state in a object destructor, ignoring any write errors (best effort strategy). Unless we don't want to give any guarantee in case an error is reported to the DAQ software...

lpetre-ulb commented 4 years ago

Provided this requires no downstream changes (which I believe is the case)

Indeed, this is a bug-fix (despite the slight API change). All the software was designed as if the ctp7_modules code was working properly.