Feature: time history analyzer

lmoureaux commented 6 years ago

Description

Add a new macro, plotTimeSeries.py, as requested in #88.

The macro works in three steps:

Bad scan removal: Scans that failed to produce consistent results are removed.
Range detection: The time evolution of each channel is searched for successive scans with consistent behavior. A set of such scans is called a (time) range. What kind of behavior is searched for is used-defined.
Analysis: The properties of "ranges" are computed and printed.

Bad scan removal

Scans that pass any the following cuts are removed:

The average noise over the entire detector is lower than 0.1 fC (or --minScanAvgNoise). This cuts scans with none or very few channels responding. Here's the distribution of noise for all scans of GEMINIm30L2: 0.1 was chosen as default instead of, say, 0.22 because other detectors may have lower noise, and maybe we'll manage to lower the noise. It's not a problem if some "bad" scans remain as long as most of them are removed.
The fraction of masked channels is above 7% (or --maxScanMaskedFrac)

Range detection

The time evolution of each channel is searched for successive scans with consistent behavior. A set of such scans "bad" scans for a given channel is called a (time) range; the definition of bad is user defined (see below).

Range finding starts with a list of scans, where each scan is marked as "good" or "bad". The definition of "bad" depends on what's being searched for (and "good" is always defined as "not bad"). The start of a range is determined by:

Starts with a "bad" scan (see below)
The channel wasn't "bad" in the previous scan (e.g going good to bad)

Then the range continues and the end of the range is determined by 5 consecutive good scans appearing (option: --numEndScans). To prevent the printing of spurious ranges due to transient effects ranges with less than 4 "bad" scans in total are suppressed (option: --minBadScans). A "range" found by this algorithm can have include some "good" scans.

As a side-effect, channels with sparse "bad" behavior, such as 88 in GEMINIm30L2, are also extracted. This can be controlled by tightening the cuts in the algorithm above.

Three definitions of "bad" are currently available:

mask: the channel under consideration is masked
maskReason: the channel under consideration has a non-zero maskReason
zeroInputCap: the channel under consideration has an scurve width that is consistent with zero input capacitance ( 4.14E-02 < scurevWidth < 1.09E-01 fC). The precise values can be controlled using the --minNoise and --maxNoise options.

This is somewhat rough and not exactly what asked for in #88, but it finds problematic ranges correctly, including some with sparse bad scans, see eg channels 88-89 below.

Analysis

For every "range" found in each of the VFATs, the following properties are computed and printed:

Column header	Meaning
`ROBstr` or `vfatCH`	Strip number and VFAT channel, respectively
Last known good	Date and time of the last good scan before the range ("never" if the range starts at the first scan)
Range begins	Start date and time
Range ends	End date and time ("never" if the range includes the lastest scan)
#scans	Total number of scans (good and bad)
masked%	Percentage of `#scans` where the channel is "masked" not to be confused with "bad (useful to investigate channels that behave badly once in a while)
Initial `maskReason`	`maskReason` for the first scan in the range
Other subsequent `maskReason`s	`maskReason` not present for the first scan but found in a later scan in the same range

A summary table of initial maskReason vs VFAT is also printed at the end.

Types of changes

[ ] Bug fix (non-breaking change which fixes an issue)
[x] New feature (non-breaking change which adds functionality)
[ ] Breaking change (fix or feature that would cause existing functionality to change)

Motivation and Context

See issue #88

How Has This Been Tested?

Initially tested on GEMINIm30L2, with all S-curves taken before 1st of June. Ranges found matched existing features in data:

When using the mask definition: they correspond to consecutive scans with the corresponding channel masked
When using the maskReason definition: they correspond to consecutive scans where the corresponding channel has non-zero maskReason
When using the noise definition: they correspond to consecutive scans with the corresponding channel blank in the plot (ie with noise below 0.05 fC)

Now testing on GEMINIm29L1 data provided by @bdorney and compared to slides 22-31 here. The following configuration was used:

maskReason ranges
--onlyCurrent to select ranges that span until the latest scan
Looking at entries with DeadChannel in either initial or later maskReasons

Full output . Conclusions:

The number of dead channels found by the new tool is close to what's reported in the slides, though not exactly the same.
The start dates often (not always) match what's reported in the slides.
Most dead channels start with DeadChannel straight away
Some ranges starting with HotChannel have DeadChannel in the additional maskReasons. This is also observed in GEMINIm30L2. Reporting because it's not observed for other initial maskReasons.

Excerpt from the output:

VFAT 23

`ROBstr`	Last known good	Range begins	Range ends	#scans	Masked%	Initial `maskReason`	Other subsequent `maskReason`s
18	2017.10.11.11.24	2017.10.13.12.53	never	127	100	HotChannel,FitFailed
19	2017.10.11.11.24	2017.10.13.12.53	never	127	0	DeadChannel
20	2017.10.11.11.24	2017.10.13.12.53	never	127	0	DeadChannel
31	2017.10.11.11.24	2017.10.13.12.53	never	127	0	DeadChannel
96	2017.06.15.15.10	2017.06.16.14.35	never	156	58	HotChannel,HighNoise	DeadChannel

Initial maskReason summary

The table below shows the distribution of the initial maskReason for ranges found in each VFAT. Note that a single range is counted as many times as it has maskReasons.

	HotChannel	FitFailed	DeadChannel	HighNoise	HighEffPed
0	0	0	2	0	0
1	0	0	1	0	0
2	3	3	0	0	0
3	2	1	0	1	0
4	0	0	0	0	0
5	3	5	13	0	0
6	2	34	0	3	0
7	25	4	26	14	1
8	1	1	9	0	0
9	2	2	1	0	0
10	5	5	2	0	0
11	4	4	0	0	0
12	0	0	0	0	0
13	2	2	0	0	0
14	0	0	0	0	0
15	9	4	15	0	0
16	2	3	2	0	0
17	4	2	0	0	0
18	2	2	0	0	0
19	6	4	2	0	0
20	0	0	0	0	0
21	3	3	0	0	0
22	6	3	0	0	0
23	2	1	3	1	0

Checklist:

[x] My code follows the code style of this project.
[x] My change requires a change to the documentation.
[ ] I have updated the documentation accordingly.
- Code is well documented
- Call interface and concepts are not yet
[ ] I have added tests to cover my changes.
[x] All new and existing tests passed.

lmoureaux commented 6 years ago

The code is now ready for initial review, but please check the TODO items in the description.

bdorney commented 6 years ago

Bad scan removal

Scans that pass any the following cuts are removed:

No channel is masked

The fraction of masked channels is above 7% [TODO: add option]

Please add clarification here. It is not clear what you are removing or why. e.g. why would you remove a scan that has no masked channels? This would ideally represent a perfect detector. The fraction of masked channels here is supposed to serve as what? Why is it not being considered?

Detection of masked ranges

The code finds ranges of masked channels in time. A range meets the following criteria:

Starts with a "bad" scan

The channel wasn't "bad" in the previous scan

Has at least 4 "bad" scans [TODO: add option]

At most 5 not-"bad" scans between to "bad" scans [TODO: add option]

What is meant by a range?

Three definitions of "bad" are currently available:

mask: the channel under consideration is masked

maskReason: the channel under consideration has a non-zero maskReason

noise: the channel under consideration has a low noise (< 0.05 fC)

What is this noise criterion supposed to be identifying? Why introduce a new term that is not present in maskReason? Additionally it is partially in the charge range of a channel with zero input capacitance: [4.14E-02, 1.09E-01] fC which depending on how it is used could cause issues in identifying them. Please provide additional details above.

This is somewhat rough and not exactly what asked for in #88, but it finds problematic ranges correctly, including some with sparse bad scans, see eg channels 88-89 below.

If you are trying to address #88 it would expected that what is asked for in #88 be addressed.

Tested on GEMINIm30L2, with all S-curves taken before 1st of June. Ranges found match existing features in data.

What do you mean Ranges found match existing features in data? What are ranges here? And what is being searched for?

Channel Known good Range begins Range ends #scans Masked% Initial maskReason Other subsequent maskReasons

88 2017.04.07.15.46 2017.04.09.15.16 2017.05.31.15.09 26 57 HotChannel,HighNoise

Channel	Known good	Range begins	Range ends	#scans	Masked%	Initial `maskReason`	Other subsequent `maskReason`s
88	2017.04.07.15.46	2017.04.09.15.16	2017.05.31.15.09	26	57	HotChannel,HighNoise

The column headers here are not described, what are you trying to illustrate here?

Initial maskReason summary

The table below shows the distribution of the initial maskReason for ranges found in each VFAT. Note that a single range is counted as many times as it has maskReasons.

This table is what is being requested in #88, e.g. initial maskReason was the first instance that was assigned? You mention you have not done what was requested in #88; it is hard to follow as a lack of description is included with this new tool.

Additionally in the table for this spot can channels have multiple initial maskReasons? If not why not? For example I see VFAT3 has 7 channels marked as HotChannel, FitFailed, and DeadChannel. Are these the same 7 dead channels in each case?

Add useless 'pass' everywhere … They're useless but match the style of the existing codebase

Be professional in your commit messages; you may find them "useless" but they improve readability.

bdorney commented 6 years ago

There should also be a corresponding update to README.md that gives a description of input arguments, outputs, and usage case examples. And that seems to be what's missing here in this description. If an expert cannot understand this easily; a USER will not be able too.

lmoureaux commented 6 years ago

Quick answers to select initial comments on the PR description

What is meant by a range?

A set of successive scans for a given VFAT and channel. I'm not happy with the wording either; please provide a better, preferably short, name if can you have one.

What is this noise criterion supposed to be identifying? Why introduce a new term that is not present in maskReason?

That's supposed to be what was requested in #88 -- or am I mistaken on the purpose of the noise cuts? If there's not enough detail for me to understand what's asked for, please update the request. It's in particular relevant because of this sentence in #88:

Caveats is that this needs to be modified to not be sensitive to transient effects

So the algo needs to be modified, which is what I'm trying to achieve here. But it's hard not to break it if I don't understand what it's meant to do.

This table is what is being requested in #88, e.g. initial maskReason was the first instance that was assigned?

What actually happens is more complicated that channels dying over time (even though the end result is the same). They can appear as dead for some time, be recovered, and then die for good. Such cases exist. Every detected "good->bad" (1) transition is recorded in the table.

(1) What exactly "good" and "bad" mean here depends on the value passed to the --ranges option.

Additionally in the table for this spot can channels have multiple initial maskReasons? If not why not?

When a transition happens because of several maskReasons, it gets an entry in each. It's not clear from the context info in #88 what information you want to extract from this table.

Channels with multiple maskReasons appear in multiple bins.

For example I see VFAT3 has 7 channels marked as HotChannel, FitFailed, and DeadChannel. Are these the same 7 dead channels in each case?

To answer this question one should look at the per-VFAT lists. In this case there are 7 DeadChannel and 7 HotChannel,FitFailed. For info, all of them were already dead at the time of the first scan and none was recovered afterwards.

you may find them "useless" but they improve readability.

It's a matter of taste. IMHO they add unnecessary visual clutter and in most cases they actually reduce readability (the only exception being deeply nested loops, which are to be avoided anyway).

lmoureaux commented 6 years ago

Expanded the documentation in the first post; it will eventually be included in the repo

lmoureaux commented 6 years ago

Link to the documentation

cms-gem-daq-project / gem-plotting-tools