Open bdorney opened 5 years ago
May I suggest another solution for aligning the sbits ? I think we could find a firwmare/software solution which is more robust (and faster) than a pure software routine.
Here is how I see things :
Instead of using fixed delay taps, one could dynamically configure the delays in order to always sampling the signal in the middle of the eye. We could implement a system similar to XAPP585, particularly per-bit deskew. It would be aligned in near realtime and could correct for voltage and temperature variations.
Once we can reliably sample the time-multiplexed sbits, it would be possible to align all of them with a training phase. More specifically, configure the VFAT as follow : mask all channels except 0&1, 16&17, ... so the enabled sbits would be 0, 8, ... Set also the THR_ARM_DAC to a very low value (e.g. 0x1) in order to constantly measure noise. Therefore, on each sbit differential pair (and SOT also, I think), one would see 10000000. The signal is aligned using a simple bitslip. This is also where we see if there is an inversion in the polarity and correct for it.
Once the training phase is done, the VFAT can return to "normal" operating mode. The alignment can be continuously checked by looking at the SOT frame.
I also think that such a solution can easily be ported to GE2/1 & ME0. In absence of an FPGA on the OH the step 1 should be done by the LpGBT (GBTX already phase aligns the data if I'm not mistaken). Steps 2 should be done in the backend firmware.
As the correct sbit mapping is required for the new TDC with full granularity, I would not be able to reliably test the new TDC firmware before the sbit mapping issue is solved. I could try to implement the previously described solution next week.
Hi Laurent,
Thanks for the feedback. I have some comments inline below:
On Thu, Oct 25, 2018 at 2:43 AM lpetre-ulb notifications@github.com wrote:
May I suggest another solution for aligning the sbits ? I think we could find a firwmare/software solution which is more robust (and faster) than a pure software routine.
Here is how I see things :+1: Instead of using fixed delay taps, one could dynamically configure the delays in order to always sampling the signal in the middle of the eye. We could implement a system similar to XAPP585 https://www.xilinx.com/support/documentation/application_notes/xapp585-lvds-source-synch-serdes-clock-multiplication.pdf, particularly per-bit deskew. It would be aligned in near realtime and could correct for voltage and temperature variations.
The firmware right now uses dynamic configuration of the delays (based on https://www.xilinx.com/support/documentation/application_notes/xapp881_V6_4X_Asynch_OverSampling.pdf )
It centers the data inside the eye automatically, based on the SOT pulse which is received every clock. We wanted to phase length match the different S-bits coming from a VFAT so that in principle the same alignment state machine could be used for all 9 pairs coming from a single VFAT. The SOT would determine the timing and the corresponding S-bit traces would be automatically aligned to it because they have the same timing.
The phase alignment that was done on the PCBs however, is not good, so there is some skew from channel to channel and we hoped to just correct that with fixed delays that simply align the S-bits coming from a single VFAT so that they are in completely in sync with eachother. Temperature drift and so on should affect all 9 pairs equally (at least within the tolerance of the very large 3.125ns eye).
So the process of timing in these delays should just need to be done once in the lab and we are over with it for the whole detector, and do not need special routines at the beginning of every hard reset. We did it already on v3a by hand and it worked well but nobody ever repeated the exercise on v3b and v3c where some positions have changed.
The belief underlying this is that there may be VFAT to VFAT variation, GEB to GEB variation, but that the variation within a single VFAT should be small and this mechanism just needs to keep them in phase +- a nanosecond or so (using 78ps tap delays) so there is actually a lot of slosh for things to be out of time. The big requirement of this system is that the different output channels of a single VFAT should be consistently timed in with eachother when coming from the VFAT, which I really hope is true, and that the IODelays work more-or-less correctly within the slack acceptable by the sampling window (which they should, they are calibrated by the chip.
Once we can reliably sample the time-multiplexed sbits, it would be possible to align all of them with a training phase. More specifically, configure the VFAT as follow : mask all channels except 0&1, 16&17, ... so the enabled sbits would be 0, 8, ... Set also the THR_ARM_DAC to a very low value (e.g. 0x1) in order to constantly measure noise. Therefore, on each sbit differential pair (and SOT also, I think), one would see 10000000. The signal is the aligned using a simple bitslip. This is also where we see if there is an inversion in the polarity and correct for it.
This is basically just what we are trying to do right now with the script that Brian described, except not as an automatic routine but just something to derive constants for the firmware. This is a possibility of course, to have automatic alignment using some calpulses but I wanted to try to get this working on the boards without co-dependent CTP7 firmware and software routines that I have no control over. It seemed to work fine but if we run into problems perhaps we reconsider whether something like this is needed.
Once the training phase is done, the VFAT can return to "normal" operating mode. The alignment can be continuously checked by looking at the SOT frame.
I also think that such a solution can easily be ported to GE2/1 & ME0. In absence of an FPGA on the OH the step 1 should be done by the LpGBT (GBTX already phase aligns the data if I'm not mistaken). Steps 2 should be done in the backend firmware.
GBT does not phase align data. It has a similar fixed delay, and we do a hand-scan of phase values to find the window and then fuse a hard-coded sampling phase into the chip.
As the correct sbit mapping is required for the new TDC with full granularity, I would not be able to reliably test the new TDC firmware before the sbit mapping issue is solved. I could try to implement the previously described solution next week.
S-bit "mapping" only creates a rotation of the S-bits so that 01234567 becomes 7123456. The TDC just uses the OR of the entire VFAT, correct? In which case you should be able to proceed as is, right?
Fyi, there are several unrelated problems that are sometimes referred to as "S-bit mapping" but many of them seem to perhaps be unrelated to the timing/mapping but do fall under the umbrella of problems with S-bits.
In the plots shown previously by Brian, for example, of GEB v3b:
None of these problems seem to be what I would expect from timing issues. Either the VFAT or the OH seems to just be broken in slots 16, 17, 22, 14, or bad solder joints, etc.. Polarity inversion could explain the issue on VFAT14.
On the GEB v3c, you can see perhaps a timing issue in VFAT18, VFAT8, VFAT0, VFAT11, VFAT3 but all of these problems would not be an issue if you are just using the OR of the VFAT for timing measurement
VFAT14 has something else very wrong that could not be explained by timing or inversion.
All the issues with calpulses showing up in the wrong VFAT also should have nothing to do with mapping or timing and could be indicative of something like crosstalk (which we know exists, since we see S-bits coming from disconnected VFATs).
We will be working on the timing question in the next few weeks and should have an idea soon how well things work, how consistent the parameters are across time, temperature, etc and hopefully should be able to fix some of these issues through firmware (but certainly not all of them).
Best wishes,
Andrew
Hi Andrew,
Thank you very much for your very detailed reply.
GBT does not phase align data. It has a similar fixed delay, and we do a hand-scan of phase values to find the window and then fuse a hard-coded sampling phase into the chip.
I had seen the possibility for the GBTX for automatically choose the correct phase in some slides. By more carefully reading the manual, I see that this method is not resistant to SEUs. Too bad...
S-bit "mapping" only creates a rotation of the S-bits so that 01234567 becomes 7123456. The TDC just uses the OR of the entire VFAT, correct? In which case you should be able to proceed as is, right?
Indeed, the actual version of the TDC uses the OR of an entire VFAT. This is how we made the first measurement with the v3 electronics (see this elog). You can notice that we still have a lot of improvement to do, both on the setup and on the detector configuration.
However, the final aim is to measure the time resolution with the full Sbit granularity, that is by using the "Sbits word" coming from the "Sbits cluster packer". Modifying the TDC module is not difficult, but it is nearly impossible to test it without the proper Sbit mapping.
Regarding the rotation of the Sbits, are you sure that it is not possible that one Sbit is not correctly associated to the correct BX ? If you look at the histogram slide 7 of this presentation, it looks like there are three peaks. The leftmost one is roughly separated from the main one by 25ns, that is 1 BX. While working on the v2a with an old firmware which did not time align the Sbits, I observed a similar behavior. The fix (for the timing measurement) was to OR the Sbits on the VFAT2 itself and use only 1 Sbit transmission line.
On the GEB v3c, you can see perhaps a timing issue in VFAT18, VFAT8, VFAT0, VFAT11, VFAT3 but all of these problems would not be an issue if you are just using the OR of the VFAT for timing measurement
VFAT14 has something else very wrong that could not be explained by timing or inversion.
One remark about this plot; I don't known if it is written somewhere, but on this GEBv3c plot posted by Brian, the firmware uses the v3b taps configuration. More precisely, this is the first TDC firmware, based on version 3.1.2B. So that configuration has mixed hardware/firmware. It might explain a behavior different than those observed on others GEBv3c.
We will be working on the timing question in the next few weeks and should have an idea soon how well things work, how consistent the parameters are across time, temperature, etc and hopefully should be able to fix some of these issues through firmware (but certainly not all of them).
Let me known if I can help you in any way with this issue. I also think we received one long GEB at ULB.
Best regards, Laurent
However, the final aim is to measure the time resolution with the full Sbit granularity, that is by using the "Sbits word" coming from the "Sbits cluster packer". Modifying the TDC module is not difficult, but it is nearly impossible to test it without the proper Sbit mapping.
This is both alarmist and also not true. You are able to see which sbits are mapped correctly using the checkSbitMappingAndRate.py
tool. For those vfats that have mismapped sbits mask them from the trigger block in the OH using the instructions here. This enables you to make tests of your FW module seamlessly.
It's possible I didn't understand the conversation above due to ignorance. But it should be explicitly clear that we will not make design choices to the optohybrid firmware just to accommodate this TDC module. If you are interested in working on solving this sbit mipmapping issue, which is a critical path issue for P5, please use the RPC module approach I've outlined above. Also since I think @andrewpeck has targeted this for his student you should discuss with him on how to contribute so we don't have two different people trying to solve the same problem (as that would be inefficient).
However, the final aim is to measure the time resolution with the full Sbit granularity, that is by using the "Sbits word" coming from the "Sbits cluster packer". Modifying the TDC module is not difficult, but it is nearly impossible to test it without the proper Sbit mapping.
This is both alarmist and also not true. You are able to see which sbits are mapped correctly using the
checkSbitMappingAndRate.py
tool. For those vfats that have mismapped sbits mask them from the trigger block in the OH using the instructions here. This enables you to make tests of your FW module seamlessly.
Sure, I can mask the unused VFATs. However, there is only one position where it is possible to connect a VFAT on the GEM chamber at ULB and if the mapping of that VFAT is wrong, it won't help to mask it. And masking VFATs will not allow to test how the TDC behaves with the full detector : will slow controls sustain the acquisition rate ? won't the (little amount of) noise mask the signal since noise will be picked up from the full detector ? ...
It's possible I didn't understand the conversation above due to ignorance. But it should be explicitly clear that we will not make design choices to the optohybrid firmware just to accommodate this TDC module. If you are interested in working on solving this sbit mipmapping issue, which is a critical path issue for P5, please use the RPC module approach I've outlined above. Also since I think @andrewpeck has targeted this for his student you should discuss with him on how to contribute so we don't have two different people trying to solve the same problem (as that would be inefficient).
Of course, it is not to accommodate the TDC module. The conversation about was about the Sbit mapping issue due to bad Sbit timing parameter in all generality. Yes, it is best if we collaborate on fixing the issue; that is the meaning of the last sentence of my previous post.
Indeed, the actual version of the TDC uses the OR of an entire VFAT. This is how we made the first measurement with the v3 electronics (see this elog). You can notice that we still have a lot of improvement to do, both on the setup and on the detector configuration.
However, the final aim is to measure the time resolution with the full Sbit granularity, that is by using the "Sbits word" coming from the "Sbits cluster packer". Modifying the TDC module is not difficult, but it is nearly impossible to test it without the proper Sbit mapping.
Regarding the rotation of the Sbits, are you sure that it is not possible that one Sbit is not correctly associated to the correct BX ? If you look at the histogram slide 7 of this presentation, it looks like there are three peaks. The leftmost one is roughly separated from the main one by 25ns, that is 1 BX. While working on the v2a with an old firmware which did not time align the Sbits, I observed a similar behavior. The fix (for the timing measurement) was to OR the Sbits on the VFAT2 itself and use only 1 Sbit transmission line.
Yes, the timing is a whole separate issue that will need to be addressed as well.
The bx is determined by the alignment of the SoT relative to the 40MHz clock.
But right now the 40MHz clock phase is completely arbitrary, so depending on the phase the S-bits will end up split randomly into different bunches. We need to phase shift the 40MHz clock (done on the GBTx) so center the data so that the S-bits that are supposed to be synchronous are falling in the same bx. Nobody has ever done this step (you are the first person besides me to even mention it... :(
One remark about this plot; I don't known if it is written somewhere, but on this GEBv3c plot posted by Brian, the firmware uses the v3b taps configuration. More precisely, this is the first TDC firmware, based on version 3.1.2B. So that configuration has mixed hardware/firmware. It might explain a behavior different than those observed on others GEBv3c.
Supposedly, based on the design files, the v3c and v3b should be the same, but this doesn't seem to be the case in reality :( So we need to figure it what it is supposed to be.. but naively on the 1st order they should be the same, to the best of our knowledge, hence why Brian was using the v3b config on v3c electronics.
Our student is starting today with getting things setup.. hopefully it won't take very long to get some working config for the v3c
Brief summary of issue
So we have seen that we have an issue with the sbit mapping in V3 electronics. This issue persists when using GEBv3c+OHv3c hardware:
While the situation with complete v3c hardware is improved it is still not desired. Additionally this is just for the short detector and we will need a set of parameters also a long detector. Then for GE2/1 there will be 8 sets of parameters, and ME0 will contribute another set. So we need a software routine that can automatically determine the correct set of timing registers.
The registers of interest are:
GEM_AMC.OH.OHX.FPGA.TRIG.CTRL.SOT_INVERT
GEM_AMC.OH.OHX.FPGA.TRIG.TIMING.SOT_TAP_DELAY_VFATY
GEM_AMC.OH.OHX.FPGA.TRIG.TIMING.TAP_DELAY_VFATY_BITZ
GEM_AMC.OH.OHX.FPGA.TRIG.CTRL.VFATY_TU_INVERT
According to @andrewpeck
The 100-pin panasonic connector looks like:
The convention for the trigger unit that Tuomas has explained (@andrewpeck's email above) is shown as:
So
GEM_AMC.OH.OHX.FPGA.TRIG.TIMING.TAP_DELAY_VFATY_BITZ
follows the hardware.We already have one tool that checks the mapping:
https://github.com/cms-gem-daq-project/ctp7_modules/blob/e1d9d0c52a9bd5ffae96e99706ffd12b6ba2809b/src/calibration_routines.cpp#L963
I would be against modifying this tool to try to correct the mapping (since if you modify the 4 registers above incorrectly you can affect not just the
Z^th
bit but all 8 SBIts due to how the OH is expecting them. So what I would propose is the following procedure:checkSbitMappingWithCalPulseLocal()
function should be used to check that the sbit mapping is correct (this is easily done with checkSbitMappingAndRate.py,correctSBitMappingErrorsLocal(...)
,correctSBitMappingErrorsLocal(...)
,The
correctSBitMappingErrorsLocal(...)
would callcheckSbitMappingWithCalPulseLocal()
, with a small event count, after making modifications so this could eliminate step 5 above.Types of issue
Expected Behavior
How I expect
correctSBitMappingErrors(...)
andcorrectSBitMappingErrorsLocal(...)
to function. General flow is shown below.Unless otherwise noted for the code that will be added to
ctp7_modules
this should be placed incalibration_routines.h
andcalibration_routines.cc
.The calling function on the DAQ Machine
This is a new development and I'm not sure if the calling function should be created in the legacy
xhal
branch, or if it should be placed incmsgemos
(this in my eye is a calibration routine so it doesn't really fit in aHwDevice
some input from @mexanick and @jsturdy would be appreciated here).However general overview should be something like:
N_Mismatches > cutVal
the sbit is assume to have inverted polarityVFATN
is in the table set the key in the following cases:MappingVFATN
and this should set astd::vector<uint32_t>
to this key),InvertedVFATN
and this should set astd::vector<uint32_t>
to this key),SOT_INVERT
, one integerSOT_TAP_DELAY_VFATY
, 24 integersTAP_DELAY_VFATY_BITS
,24*8 = 192
integersVFATY_TU_INVERT
24 integersInput parameters should be:
N_Mismatches
exceeds the cut value the sbit is assumed to have inverted polarity.Example table format is something like:
Here any sbit with
N_Mismatches
beyond 25k can be assumed to have an inverted polarity but this strongly depends on the event count used whencheckSbitMappingAndRate.py
was called.Outline of
correctSBitMappingErrors(...)
Here we are getting the information from the RPC request, and it falls in two categories:
For the first case (wrong timing) we will construct a
std::map<std::string,std::vector<uint32_t> >
from the inputMappingVFATN
keys:vfatN
check if a key exists"MappingVFATN"
exists in the rpc message,std::vector
of mis-mapped sbits from theget_word_array
function,std::map
where the key is"MappingVFATN"
or just "VFATN" for simplicitySimilarly we should construct a second map (as above) from the
"InvertedVFATN"
keys.This should then get the
vfatmask
using:https://github.com/cms-gem-daq-project/ctp7_modules/blob/e1d9d0c52a9bd5ffae96e99706ffd12b6ba2809b/src/amc.cpp#L42
It should then loop over all unmasked vfats and for each iteration it should call the local function and use the constructed maps as input. The local function
correctSBitMappingErrorsLocal()
which should take the following input parameters:ohN
,vfatN
vfat
is in the constructed maps but also is in thevfatmask
this should probably raise either an error or a warning (by setting the"error"
key or"warning"
key in the RPC response), this probably means the hardware isn't configured correctly (e.g. VFATs out of sync and user needs to be told)std::vector
stored in thestd::map
for the"MappingVFATN"
key,std::vector
stored in thestd::map
for the"InvertedVFATN"
key, andThe local function could then return for this > whose keys are:
vfatN
an std::map<std::string, std::vectorSOT_TAP_DELAY_VFATY
, this is a vector of one elementTAP_DELAY_VFATY_BITS
, this is a vector of 8 elementsVFATY_TU_INVERT
this is a vector of one element These three should then be added to three maps (that should be initialized before calling the loop above) and after all unmasked VFATs are looped over will contain all timing and inverted registers which give the correct configuration, e.g.:After everything is said and done there should be a read of
SOT_INVERT
and this should be placed in the RPC response as a data word.Then the three final maps (
map_sotTapDelay
,map_vfatTapDelay
,map_sotTapDelay
) should be looped over (they will all have the same keys so one loop is sufficient) and stored in the RPC response, e.g.:The function on the DAQ machine now has the correct configuration for this link.
Outline of
correctSBitMappingErrorsLocal(...)
The local function will then be where the actual "meat" of the algorithm is done. This function should look like:
This function at the end should always read the following registers:
SOT_TAP_DELAY_VFATY
,TAP_DELAY_VFATY_BITZ
, andVFATY_TU_INVERT
This could be done by having a dedicated RPC method in
vfat3.h/vfat3.cc
(for reading one VFAT) andoptohybrid.h/optohybrid.cc
(for reading all VFATs) and the one invfat3.h
should be called bycorrectSBitMappingErrorsLocal
.It should only try to correct the mapping if
correctMapping
is true.First we should loop over those members of
invertedSBITs
and write the corresponding bits inVFATY_TU_INVERT
. This should be done by:24TU_TXD_P<N>
and24TU_TXD_N<N>
pair thei^th
element of invertedSBITs refers to using the convention @andrewpeck illustrates above.VFATY_TU_INVERT
that corresponds to this24TU_TXD_P<N>
and24TU_TXD_N<N>
pair,invertedSBITs
that will share this pair and all need to be flipped, so once you flip the bit the first time, any other elements ofinvertedSBITs
that correspond to this pair should not cause the bit to be flipped againinvertedSBITs
.After this you should call:
https://github.com/cms-gem-daq-project/ctp7_modules/blob/e1d9d0c52a9bd5ffae96e99706ffd12b6ba2809b/src/calibration_routines.cpp#L946-L963
Care should be taken to construct the input arguments properly (see function documentation). Additionally you don't need a lot of events (
nevts=10
is probably sufficient). Also using the calpulse in voltageStepPulse mode should be fine (e.g.useCurrentPulse = false
). You then should analyze theoutData
container to see if any of the bits you flipped suffer from mis-mapping. To do this see this example:For any new mismatches you find you should add these to
mismappedSBits
, e.g.:Now here is where the hard part is. For each element of
mismappedSBits
the delays should be such that:TU_SOT_P_24_TU_SOT_P_24
arrives first in the FPGA, followed by:24TU_TXD_P<0>
, followed by24TU_TXD_P<1>
, followed by24TU_TXD_P<7>
Note I've suppressed the negative part of the pair. To ensure this you need to manipulate:
SOT_TAP_DELAY_VFATY
, andTAP_DELAY_VFATY_BITZ
To accomplish this for the element of
mismappedSBits
. However other elements ofmismappedSBits
may share the same pair (e.g.24TU_TXD_P<N>
and24TU_TXD_N<N>
as the current element). So you should track which pairs you've already modified to prevent subsequent modification. Additionally, and more importantly, an element inmismappedSBits
that is later on in the VFAT may be affected by your modification of an earlier bit. I would propose the following:mismappedSBits
comes from, this determinesTAP_DELAY_VFATY_BITZ
.1
to thisTAP_DELAY_VFATY_BITZ
register, (not sure the size of this register, but you should stop at the max...),TAP_DELAY_VFATY_BITZ
registers whereZ_prime > Z
also add 1.checkSbitMappingWithCalPulseLocal(...)
with a low event count,outData
and remove any element frommismappedSBits
which is now correctly mapped, add an sbit that is now incorrectly mapped, andSome input here from @andrewpeck is needed to see if the above makes sense, particularly steps 2 & 3. For Step 5 I would suggest to use the Erase-Remove Idiom; you can find examples on stackoverflow.
Then afterward this function should read the following registers:
SOT_TAP_DELAY_VFATY
,TAP_DELAY_VFATY_BITZ
, andVFATY_TU_INVERT
Store these in an
std::map<std::string,std::vector<uint32_t> >
and return it. The mapping should now be correct.Current Behavior
You have to do the above by hand using
gem_reg.py
(bad).Context (for feature requests)
The sbit mapping is wrong. We need a software solution to correct this for both GE1/1 and future upgrades (GE2/1 & ME0).