Closed AndrewLevin closed 4 years ago
Thank you for the prompt PR. Have you tested tested that this new cut is still stringent enough to catch the bad DAC scan fits? In your previous PR you are writing that bad DAC scan fits have a chi² around 1400. If I assume your were talking about the CFG_HYST
DAC, it would mean a chi²/NDF just above 21.
It is CFG_HYST
, and the chisquared/ndof = 1457.5165/58 = 25.13
I have also checked the only other case that we are aware of (https://mattermost.web.cern.ch/cms-gem-ops/pl/xxspysnb3bgzzjkfozr8dwpwdh), and found that the chisquared/ndof value is even larger.
I have just updated the cut value to 15. Have you found any good-looking fits that exceed this?
It is
CFG_HYST
, and the chisquared/ndof = 1457.5165/58 = 25.13
Right... the fifth order polynomial...
I have also checked the only other case that we are aware of (https://mattermost.web.cern.ch/cms-gem-ops/pl/xxspysnb3bgzzjkfozr8dwpwdh), and found that the chisquared/ndof value is even larger.
I have just updated the cut value to 15. Have you found any good-looking fits that exceed this?
Not in the few examples I've seen. I'll have 6 setups where the VFATs are know to work ready tomorrow; I could probably test your new cut on all of them at once in the evening.
So, I just launched the DAC scan analysis on 5 setups (or 120 VFATs). A large χ² is rightfully reported for a few fits:
Warning: large chisquare for VFAT14 of chamber TestStand10 (Shelf1,Slot4,OH4) DAC CFG_BIAS_CFD_DAC_2.
Warning: large chisquare for VFAT14 of chamber TestStand10 (Shelf1,Slot4,OH4) DAC CFG_BIAS_PRE_I_BIT.
Warning: large chisquare for VFAT14 of chamber TestStand10 (Shelf1,Slot4,OH4) DAC CFG_BIAS_PRE_I_BLCC.
Warning: large chisquare for VFAT14 of chamber TestStand10 (Shelf1,Slot4,OH4) DAC CFG_BIAS_PRE_I_BSF.
Warning: large chisquare for VFAT11 of chamber TestStand10 (Shelf1,Slot4,OH0) DAC CFG_BIAS_SD_I_BDIFF.
Warning: large chisquare for VFAT14 of chamber TestStand10 (Shelf1,Slot4,OH4) DAC CFG_BIAS_SD_I_BDIFF.
Warning: large chisquare for VFAT14 of chamber TestStand10 (Shelf1,Slot4,OH4) DAC CFG_BIAS_SD_I_BFCAS.
Warning: large chisquare for VFAT14 of chamber TestStand10 (Shelf1,Slot4,OH4) DAC CFG_BIAS_SD_I_BSF.
Warning: large chisquare for VFAT14 of chamber TestStand10 (Shelf1,Slot4,OH4) DAC CFG_BIAS_SH_I_BDIFF.
Warning: large chisquare for VFAT16 of chamber TestStand10 (Shelf1,Slot4,OH10) DAC CFG_BIAS_SH_I_BDIFF.
Warning: large chisquare for VFAT14 of chamber TestStand10 (Shelf1,Slot4,OH4) DAC CFG_BIAS_SH_I_BFCAS.
Warning: large chisquare for VFAT14 of chamber TestStand10 (Shelf1,Slot4,OH4) DAC CFG_CAL_DAC.
Warning: large chisquare for VFAT14 of chamber TestStand10 (Shelf1,Slot4,OH4) DAC CFG_HYST.
[Excluding THR_ARM_DAC and THR_ZCC_DAC.]
In such cases, restricting the fit range really helps. I'm not sure the current criteria should be relaxed to allow this kind of (bad) fit.
Please find the root files here.
For the script to succeed, I also had to patch anautilities.py
.
ohKey
here:
https://github.com/cms-gem-daq-project/gem-plotting-tools/blob/f38674504fd96af320e881a50b04e8564b260345/utils/anautilities.py#L252-L258Is it expected?
Also, the chamber name in the warning messages in always the same. I think that detName
should be updated in the loop lines 310 and that there is a typo (get**D**ame
) in line 351:
https://github.com/cms-gem-daq-project/gem-plotting-tools/blob/f38674504fd96af320e881a50b04e8564b260345/utils/anautilities.py#L351
More generally, why do we scan the whole DAC range then fit them for regular operation? I agree this is useful for QC and debugging. However, this is a slow process which doesn't allow to find the right value for the IREF
DAC. That DAC could also change with the total integrated dose. Shouldn't we (also ?) have an RPC method which find the DAC working point as described in the VFAT3 manual for the IREF
DAC (i.e. changing the value around the default one)? This procedure would be faster, would avoid any kind of fit failures and would allow to properly scan the IREF
(and eventually the SLVS driver DACs).
More generally, why do we scan the whole DAC range then fit them for regular operation? I agree this is useful for QC and debugging. However, this is a slow process which doesn't allow to find the right value for the
IREF
DAC. That DAC could also change with the total integrated dose. Shouldn't we (also ?) have an RPC method which find the DAC working point as described in the VFAT3 manual for theIREF
DAC (i.e. changing the value around the default one)? This procedure would be faster, would avoid any kind of fit failures and would allow to properly scan theIREF
(and eventually the SLVS driver DACs).
I think the motivation for the current system is that these DAC scans aren't going to be done regularly, rather, only during QC and eventually as a way to monitor the performance over time (TID), and in these cases I think having the full curve is useful. At the same time, I have no problem making a routine that can give more granular control over the scan itself...
Sorry, I just fixed the three bugs @lpetre-ulb found.
I am not sure I would call the DAC scan and analysis a slow process. It takes like 7 minutes, right?
I would propose that we use the TGraphError's built-in Eval function to determine the register setting and we use the fit just as a mechanism to flag possibly bad VFATs for a human to look at.
Regarding the IREF setting, maybe we should ask the VFAT developers about this. My understanding is that now they provide us with an IREF setting in a calibration file, just like the ADC0 slope and intercept, but I don't understand why this register is different from the other ones and why we cannot or should not scan it also.
Regarding the IREF setting, maybe we should ask the VFAT developers about this. My understanding is that now they provide us with an IREF setting in a calibration file, just like the ADC0 slope and intercept, but I don't understand why this register is different from the other ones and why we cannot or should not scan it also.
If I recall correctly, it is because it is possible to write values that would cause damage to the ASIC itself. So we decided to not scan IREF
(rather than simply only scanning within the valid range)
(from email)
Regarding the comments yesterday you stated that when performing a dac scan on CFG_IREF the register should only be scanned within 0x20 and 0x32 (inclusive).
Regarding the IREF setting, maybe we should ask the VFAT developers about this. My understanding is that now they provide us with an IREF setting in a calibration file, just like the ADC0 slope and intercept, but I don't understand why this register is different from the other ones and why we cannot or should not scan it also.
If I recall correctly, it is because it is possible to write values that would cause damage to the ASIC itself. So we decided to not scan
IREF
(rather than simply only scanning within the valid range) (from email)Regarding the comments yesterday you stated that when performing a dac scan on CFG_IREF the register should only be scanned within 0x20 and 0x32 (inclusive).
OK, then it seems to me like we should just scan the register in that range, so it is still not clear to me why they provide us with a value for the IREF register in the calibration file.
Why was this closed?
oops, i must have hit some keyboard shortcut :-(
Sorry, I just fixed the three bugs @lpetre-ulb found.
Thanks! Is there a reason why you didn't also update the for-loop line 311 (missing detName
update)?
I am not sure I would call the DAC scan and analysis a slow process. It takes like 7 minutes, right?
Something like that, maybe a bit less. However, it cannot be parallelized and the tracking data scan developments will not help.
I would propose that we use the TGraphError's built-in Eval function to determine the register setting and we use the fit just as a mechanism to flag possibly bad VFATs for a human to look at.
:+1: The error could even be smaller than with the current method since the useful DAC range is very linear and the data points are very close to each other. The fit might report some false positives, but that's better than false negatives.
Regarding the IREF setting, maybe we should ask the VFAT developers about this. My understanding is that now they provide us with an IREF setting in a calibration file, just like the ADC0 slope and intercept, but I don't understand why this register is different from the other ones and why we cannot or should not scan it also.
Regarding the IREF setting, maybe we should ask the VFAT developers about this. My understanding is that now they provide us with an IREF setting in a calibration file, just like the ADC0 slope and intercept, but I don't understand why this register is different from the other ones and why we cannot or should not scan it also.
If I recall correctly, it is because it is possible to write values that would cause damage to the ASIC itself. So we decided to not scan
IREF
(rather than simply only scanning within the valid range) (from email)Regarding the comments yesterday you stated that when performing a dac scan on CFG_IREF the register should only be scanned within 0x20 and 0x32 (inclusive).
OK, then it seems to me like we should just scan the register in that range, so it is still not clear to me why they provide us with a value for the IREF register in the calibration file.
In addition to damaging the ASIC, you can also loose the communication with the VFAT. I'm also not sure why the IREF
value is provided; probably because it is crucial to set it right. You can find the calibration procedure in the VFAT3 manual, page 42.
From my understanding, only the VREF_ADC
value and ADC0 calibration constants are absolutely required (ADC1 is probably not usable on GE1/1 because the external reference voltage changes with the FEAST and VFAT position). We should be able to re-calibrate all other DACs. And, if the DACs suffer from the TID, there is no reason why the IREF
DAC wouldn't be affected.
I think the motivation for the current system is that these DAC scans aren't going to be done regularly, rather, only during QC and eventually as a way to monitor the performance over time (TID), and in these cases I think having the full curve is useful. At the same time, I have no problem making a routine that can give more granular control over the scan itself...
Okay. So, complete DAC scans will be performed on a regular basis. The only issue is that some DACs cannot really be scanned.... Therefore, we may need to have two methods (or a single, more generic, method):
Sorry, the last detName
bug is fixed now.
I thought the DAC scan is already parallelized.
So, I will make a separate pull request to enable the scan of IREF in a restricted range. But I don't see any information about setting IREF on page 42 of this link:
https://espace.cern.ch/cms-project-GEMElectronics/VFAT3/VFAT3%20User%20Manual%20v2.2.pdf
So, the only DAC that cannot be scanned is VREF_ADC
, right?
Sorry, the last
detName
bug is fixed now.
Thank you!
I thought the DAC scan is already parallelized.
Well, the RPC method is indeed "mutilink". But... the DAC scans are internally serialized for each unmasked OH. It cannot be be truly parallelized since the VFAT slow control transactions can only be done one by one. I personally think that these kind of "multilink" scans should be deleted when migrating to the new framework; the name is only misleading and does not bring anything.
So, I will make a separate pull request to enable the scan of IREF in a restricted range. But I don't see any information about setting IREF on page 42 of this link:
https://espace.cern.ch/cms-project-GEMElectronics/VFAT3/VFAT3%20User%20Manual%20v2.2.pdf
It's on page 42 of the manual, so page 60 of the PDF. The calibration procedure does not scan a restricted range, but iterates around the default DAC value until the nominal current is reached.
Since, this "scan" is not currently required for the QC steps, I would implement it directly in the new framework. The RPC method would take the DAC to be calibrated and the ADC calibration constants as inputs are return the found working point for that DAC. If it cannot be found, an exception can be thrown. Of course, the ranges should be restricted to keep the ASICs safe.
It is probably worth opening an issue in the ctp7_modules
and have a discussion with the VFAT3 designers though.
So, the only DAC that cannot be scanned is
VREF_ADC
, right?
Yes. It could be scanned, but then the calibration constants for the ADC would be different and we don't have any possibility to re-calibrate it (requires external tools).
I thought the DAC scan is already parallelized.
Nothing that is done in software is truly "parallelized", simply because it requires reading/writing a specific register.
What can probably be done, is a reoptimization of the loops for the "multi-link" mode, as it currently just loops over the links and performs the dacScanLocal
function, in which there is a 1s sleep for every DAC...
I've mentioned this to @bregnery, but for anyone who wants to look at how the loops are done, and figure out the optimization, that's something we should address in the post templated framework (not before)
So, I will make a separate pull request to enable the scan of IREF in a restricted range. But I don't see any information about setting IREF on page 42 of this link:
https://espace.cern.ch/cms-project-GEMElectronics/VFAT3/VFAT3%20User%20Manual%20v2.2.pdf
On page 42, the procedure is described, but it cannot be performed in situ in a closed chamber... There was also a promise from the VFAT3 team that the manual would be updated, but as yet it hasn't happened. We could ping Aamir or Franceso to see if this is still in the works, or if it has been done (though they may be waiting on the results from the irradiation studies)
So, the only DAC that cannot be scanned is
VREF_ADC
, right?
See page 43 for a description on how to tune ADC0 (probably this is the value that is provided in the DB)
Apparently there are two page 42s of the VFAT3 manual, one on page 44 of the pdf and one on page 60 of the pdf...
I think probably we can already merge this pull request since it is at least an improvement from what we have now, and continue the IREF discussion elsewhere, and I will make a separate pull request to use the TGraphError's evaluation.
I thought the DAC scan is already parallelized.
Nothing that is done in software is truly "parallelized", simply because it requires reading/writing a specific register. What can probably be done, is a reoptimization of the loops for the "multi-link" mode, as it currently just loops over the links and performs the
dacScanLocal
function, in which there is a 1s sleep for every DAC... I've mentioned this to @bregnery, but for anyone who wants to look at how the loops are done, and figure out the optimization, that's something we should address in the post templated framework (not before)
Right, it should be possible to merge the 1s sleep, but I'm not convinced that it will be a real improvement. The scans are intrinsically much longer.
Also, the 1s sleep should probably be completely removed. It is only there because this RPC method switches the VFATs into run mode at the begin and switches them out of run mode at the end. I believe this should be done somewhere else; the RPC method shouldn't "turn on/off" the VFATs in an uncontrolled/unconsistent way.
Apparently there are two page 42s of the VFAT3 manual, one on page 44 of the pdf and one on page 60 of the pdf...
... that manual... we really need a new version...
I think probably we can already merge this pull request since it is at least an improvement from what we have now, and continue the IREF discussion elsewhere, and I will make a separate pull request to use the TGraphError's evaluation.
:+1:
may as well rebase
while you're at it
The cut on the chisquared of the DAC scan fit before an exception is thrown is changed to a cut on the chisquared divided by number of degrees of freedom. The default value of the cut is set to 20.
Description
The cut on the value of chisquared of DAC vs ADC fits was too strict, and did not take into consideration that different DACs have different ranges.
Types of changes
Motivation and Context
Partially resolves issue https://github.com/cms-gem-daq-project/gem-plotting-tools/issues/262
How Has This Been Tested?
Yes, I tested that non of the DAC scan fits in the example provided by @lpetre-ulb have chisquared/ndof larger than 10, then I added a safety factor of around 100%.
Screenshots (if appropriate):
Checklist: