[Feature request] Create a tool to produce RX GBT phase scan aggregation results

lpetre-ulb commented 5 years ago

Brief summary of issue

Aggregating GBT RX phase scan results is useful to deduce good default values from a set of phase scan results. Default values then feed a LUT used for the "all good phases" case.

This issue proposes to add a such tool to the analysis tools.

Types of issue

[ ] Bug report (report an issue with the code)
[x] Feature request (request for change which adds functionality)

Expected Behavior

The new tool is expected to take a list of GBT phase scans as input and to output a single plot on which the results from the listed phase scans are superimposed. For example here is the expected kind of plot produced with matplotlib for 20 short GEBs & 20 long GEBs:

ShortAndLongGEB

The tool can be provided as a new program/script or can be integrated as a sub-parser into the makePhaseScanPlots.py script.

Current Behavior

No such tool currently exists.

bdorney commented 5 years ago

The tool can be provided as a new program/script or can be integrated as a sub-parser into the makePhaseScanPlots.py script.

Probably a separate tool is needed, e.g. aggregatePhaseScanPlots.py or something similar. I was trying to think how it could be put in the current routine that makePhaseScanPlots.py is using but I think the use case is a bit different and it would not be very smooth if it was hacked together.

I think the best makePhaseScanPlots.py could do right now would be to make a final TH2I/TH2F object that would add all the histograms created together to make a "summary" histogram. But this only handles the case where you have phase scans from a different link number; of course you could concatenate phase scan input files together at the expense of losing the uniqueness of the link (e.g. now multiple detectors represent OH X in the case where you used OH X link multiple times) but this I think is probably not the way to go.

I'm not sure; what do you think @lpetre-ulb ?

lpetre-ulb commented 5 years ago

The tool can be provided as a new program/script or can be integrated as a sub-parser into the makePhaseScanPlots.py script.

Probably a separate tool is needed, e.g. aggregatePhaseScanPlots.py or something similar. I was trying to think how it could be put in the current routine that makePhaseScanPlots.py is using but I think the use case is a bit different and it would not be very smooth if it was hacked together.

I think the best makePhaseScanPlots.py could do right now would be to make a final TH2I/TH2F object that would add all the histograms created together to make a "summary" histogram. But this only handles the case where you have phase scans from a different link number; of course you could concatenate phase scan input files together at the expense of losing the uniqueness of the link (e.g. now multiple detectors represent OH X in the case where you used OH X link multiple times) but this I think is probably not the way to go.

For the same reasons as you I also think a separate tool would be better. The sub-parser was an attempt in order avoid increasing the number of tools, but the two sub-commands would be completely separated (except for the input file list maybe).

bdorney commented 5 years ago

Separate tool then; I think we are in agreement.

So now the question is usage; I guess you have all this phase scan data in your local DB (perhaps now in the GEM DB?)?

We have all our phase scan data in $DATA_PATH/detectorName although I regret not having a subfolder $DATA_PATH/detectorName/gbtScans (now might be a time to create one...).

So what would usage be? Query from DB? Provide an input file with detectorName and scandate that can be parsed by parseListOfScandatesFile() and search $DATA_PATH? Take an input file that specifies one file per line? Take a file that is a concatenation of all phase scan results?

I'm not sure what the best approach would be. If we only considered data collected by QC7 then I would say parseListOfScandatesFile approach is the way to go and implement a gbtScans subfolder. But I don't know how you've done things at ULB where a huge amount of data was already collected.

lpetre-ulb commented 5 years ago

So now the question is usage; I guess you have all this phase scan data in your local DB (perhaps now in the GEM DB?)?

Indeed we store the phase scan data in our local DB so that we can download all of them in a single ZIP file. I think these data will never be stored in the GEM DB. Do you have a strong reason to want that? Requiring for new fields is not especially easy and I'm not convinced there is any advantage in storing data in the GEM DB.

We have all our phase scan data in $DATA_PATH/detectorName although I regret not having a subfolder $DATA_PATH/detectorName/gbtScans (now might be a time to create one...).

If I'm not mistaken the phase scan data are stored in $DATA_PATH/detectorName only when doQC7.sh is used. And doQC7.sh is only used with one chamber. Right?

Since the xHAL function can write multiple links to the same file and since that functionality is used when calling testConnectivity.py for multi-link operation with the --writePhases2File option, how would the <detectorName> folder be used?

So what would usage be? Query from DB? Provide an input file with detectorName and scandate that can be parsed by parseListOfScandatesFile() and search $DATA_PATH? Take an input file that specifies one file per line? Take a file that is a concatenation of all phase scan results?

I'm not sure what you mean by query from DB since there is no structured DB involved (yet?) Regarding the other options I think it is possible to provide support for all of them in a transparent way for the user. One could allow as parameter a list of files which are easy to differentiate:

A phase scan result file, either with one scan or a concatenation of scans (5 columns)
And/or a file that specifies one file per line (1 column)
And/or a "scandate" file (2 columns)

I'm not sure what the best approach would be. If we only considered data collected by QC7 then I would say parseListOfScandatesFile approach is the way to go and implement a gbtScans subfolder. But I don't know how you've done things at ULB where a huge amount of data was already collected.

For ULB data the best approach would be that the program takes a list of phase scan files. Since all the phase scan data from the ZIP file are extracted in a single folder we can use bash expansion. Also at QC7/QC8 and with the current directory structure you can for example use:

aggregatePhaseScanPlots.py /data/bigdisk/GEM-Data-Taking/GE11_QC8/GE11-X-S-**/gbtPhaseScan_*_current.log

to aggregate the current phase scan for all the short GEBs.

bdorney commented 5 years ago

Do you have a strong reason to want that?

No I don't.

If I'm not mistaken the phase scan data are stored in $DATA_PATH/detectorName only when doQC7.sh is used. And doQC7.sh is only used with one chamber. Right?

That's correct.

Since the xHAL function can write multiple links to the same file and since that functionality is used when calling testConnectivity.py for multi-link operation with the --writePhases2File option, how would the folder be used?

Good point; I guess I was only thinking about the one link case which is not the right approach.

I'm not sure what you mean by query from DB since there is no structured DB involved (yet?)

I was musing about possible DB records but I think you cleared that up above.

Regarding the other options I think it is possible to provide support for all of them in a transparent way for the user. One could allow as parameter a list of files which are easy to differentiate:

Your proposal here sounds the best; I would vote for an algorithm that differentiates in an transparent manner to the user and no additional command line arguments/options are needed to make it work. :+1:

cms-gem-daq-project / gem-plotting-tools