V3 channel map and import tool

lmoureaux commented 6 years ago

Description

This PR adds to the repo:

Channel maps for the VFAT 3, using the same format as the VFAT 2 maps
A tool to extract them in a semi-automatic way from the document provided by the electronics people

Types of changes

[ ] Bug fix (non-breaking change which fixes an issue)
[x] New feature (non-breaking change which adds functionality)
[ ] Breaking change (fix or feature that would cause existing functionality to change)

Motivation and Context

This feature was first requested in issue #90. Further discussion happened on the mailing list.

How Has This Been Tested?

The import tool includes the following cross-checks:

Every VFAT is included exactly once in the input file
Every channel of every VFAT is present is the input file
Entries are within bounds (0 to 128 for channel and strip number, >= 0 for Panasonic pin)
The resulting channel-to-channel and channel-to-strip mappings are bijections on [0,128) (even though the first is the identity mapping). That is, every channel is mapped to a different strip and every strip has a channel mapped to it.

Checklist:

[x] My code follows the code style of this project.
[x] My change requires a change to the documentation.
[x] I have updated the documentation accordingly.
[x] I have added tests to cover my changes.
[x] All new and existing tests passed.

lmoureaux commented 6 years ago

Also, in keeping with the way the v2 maps are done (and thinking towards the future), prefer not adding the text files

The v2 maps are generated from a functional description of their contents. We don't have this for v3. An alternative would be to include the input csv files, but they contain the same information (in a less verbose format).

thinking towards the future

What do you have in mind?

lmoureaux commented 6 years ago

Rebased off develop

jsturdy commented 6 years ago

OK, admittedly, I didn't have time to fully comment on the initial request for this task, but will raise my primary objections here, as I am left to wonder why we go from one set of text based maps, to a python module that creates a map object from parsing this input file, and then dumps this map object to a text file, that is then read by some other module.

Why not just skip the intermediate step and parse the input file and produce the dict object and return it to the calling code? This will effectively be how a DB lookup will happen too; we won't be creating an intermediate text file from the DB call.

The interface could then be the same too: one treatment for a type xls/xlsx file, one treatment for an xml file, one treatment for a sqlite file (or DB object)

lmoureaux commented 6 years ago

The input text file is difficult to parse (I needed regex) and error-prone (since it's produced by humans). The tool parses it, cross-checks what can be, and dumps it in a format that's much easier to parse. This also foresees the requirement of C++ compatibility: since .txt files have already been checked, the C++ code won't have to do it -- less duplication.

DB lookup (or xml-based maps) will indeed directly result in a Python object; but the DB will also have to be filled with the relevant info. It would be natural to do that in importChannelMap.py. The database work flow could look like this:

Receive a mapping from the electronics team, inside a .xlsx file
Create the .csv file
Import it into the database using importChannelMap.py
Use it in the analysis tools

I think that having step 2 be done manually is good, because the format of the .xlsx could change. Also, reading speadsheet files is always a difficult task, even though libraries exist: one has to deal with hidden sheets, merged cells and many other fancy features. And we really don't want to add them to git.

jsturdy commented 6 years ago

OK, I see... I was thinking that this tool was directly parsing the input xls/xlsx file...

So what I would see for the future of this tool, if it is to become the de facto maping parser and DB loader, would be that it is the sole interface to the xls/xlsx files, and produces the file for DB upload, probably an xml file, templated as per the requirements of the DB group and interfacing with the DB (but if the DB group are happy with a csv file, then I don't have a strong objection). It can also also return the dict object any python code that calls it.

Then I would say that the c++ requirement should be simply that it can handle xml/sqlite files or a DB object only; the only time one needs these djanky hacks is for a teststand that is on the bleeding edge, where python has a clear advantage

bdorney commented 6 years ago

So what I would see for the future of this tool, if it is to become the de facto maping parser and DB loader, would be that it is the sole interface to the xls/xlsx

For this exact goal I think it would have been easier, and also more intuitive to use a tool specifically for parsing xls and xlsx files as I mention in my email to the online software mailing list on this specific subject.

Is there a reason that we have not use xlrd to attempt to parse this excel file? Examples (maybe not the best ones) exist and were circulated, e.g.:

https://github.com/cms-gem-detqc-project/CMS_GEM_Analysis_Framework/blob/db6c24d033f9a1caee170af68fe4ae4bb77457c7/python/Produce_Config_File.py#L70-L114

This might be more in line with what @jsturdy is looking for, a direct interface to the xls/xlsx file.

lmoureaux commented 6 years ago

Architectural changes needed in importChannelMap.py

Quoting https://github.com/cms-gem-daq-project/gem-plotting-tools/pull/134#issuecomment-405578510

[...] I think that having step 2 be done manually is good, because the format of the .xlsx could change. Also, reading speadsheet files is always a difficult task, even though libraries exist: one has to deal with hidden sheets, merged cells and many other fancy features. [...]

And the EDMS document has among others hidden sheets and merged cells.

setup post rpm install to call this script (might require a wget [...])

This will be difficult at P5 (very limited network access). Also, downloading and processing files at install time kind of defeats the purpose of packages. (Note that the files are currently not added to the package, because there's no code that could read them anyway.)

What's the problem with keeping the mappings in the repo? Since they describe the hardware, they shouldn't change too often. The v2b ones are already there, even if in a different form.

move importChannelMap.py to mapping subdirectory and module

If I understand @jsturdy correctly this means being out of $PATH. How should the man page be called in this case? Where do I document the calling procedure? Having a documented tool in a "module" subdir is something new.

lmoureaux commented 6 years ago

Please tell me if you think that my concerns are invalid, so that I know what to aim for.

bdorney commented 6 years ago

Quoting #134 (comment)

My previous comments stand, please incorporate the xlsx package. There should not be a two step process but a single step that parses the input spreadsheet and (in future) uploads to a db (now prints the text file.

What's the problem with keeping the mappings in the repo? Since they describe the hardware, they shouldn't change too often. The v2b ones are already there, even if in a different form.

Given the login requirement of EDMS we can, once generated, include the txt files into the repo. However they should be specifically referenced as pertaining to hybrid designs VFAT3 HV3b V2 and VFAT3 HV3b V1, e.g.:

longChannelMap_VFAT3-HV3b-V1_VFAT3-HV3b-V2.txt
shortChannelMap_VFAT3-HV3b-V1_VFAT3-HV3b-V2.txt

Additionally keep with previous convention and keep importChannelMap.py in the mapping subdir.

jsturdy commented 6 years ago

This will be difficult at P5 (very limited network access). Also, downloading and processing files at install time kind of defeats the purpose of packages. (Note that the files are currently not added to the package, because there's no code that could read them anyway.)

I'm not ever considering using this script as the source of the mappings at P5; by the time these mappings are used there, they should be correctly stored in the DB, and accessed using the reading functionality of some tool.

Given the login requirement of EDMS we can, once generated, include the txt files into the repo. However they should be specifically referenced as pertaining to hybrid designs VFAT3 HV3b V2 and VFAT3 HV3b V1, e.g.:

The extraction tool should also be able to select which mapping it wants to pull from the DB, is this being considered?

and produces the file for DB upload, probably an xml file, templated as per the requirements of the DB group and interfacing with the DB (but if the DB group are happy with a csv file, then I don't have a strong objection). It can also also return the dict object any python code that calls it.

Has any further followup been done with the DB group regarding the format they prefer for loading these maps into the DB? I'd strongly suggest holding this PR until that question is settled, then a template/sample of those maps generated from some fixed version of the mapping could be added to the repo.

Additionally to all that, any tool that would be using the maps would be able to make use of this interface, i.e., reading the DB compatible text (xml) file (or sqlite dump), parsing it into a mapping object, and returning that object to the using code.

At the end of the day, I strongly prefer a centrally maintained location for official read-only maps, to prevent the proliferation of mappings (a concern of @bdorney): in the future, the DB, until that time, a static location, generated by a specific person using this code, rather than in the repository itself.

lmoureaux commented 6 years ago

Read data from the xlsx directly. The xlsx reader is fragile but, since the format wasn't designed to be used this way in the first place, any reader will be.
Move files around as requested by @bdorney.

Let me stress that the way maps are used is out of the scope of this PR. importChannelMaps supports the current way; updating it later to output the data in any other format is easy.

I put this on hold until there's a clear agreement between @bdorney and @jsturdy on what should be done.

lmoureaux commented 5 years ago

I'm still waiting for input from the maintainers.

bdorney commented 5 years ago

For the time being I think the text files provided here are suitable for the short term but we should consider the long term DB interface/loader, and DB query to get the mapping. And that may not be appropriate for this repo. I’d like @jsturdy to set the direction here as it’s clear he has more experience on how to implement/use this in CMS than I do presently.

bdorney commented 5 years ago

We’ve been using the text files you created from the EDMS document to try to investigate if the mapping is correct with radioactive sources. So this has been useful exercise. Just before we rush into further development we should just ensure we are on the right path (@jsturdy)

bdorney commented 5 years ago

For lack of additional feedback and the need for a "stop-gap" before the DB is needed could you rebase this off of the HEAD of develop? Then we will merge it.

jsturdy commented 5 years ago

Then we will merge it.

I would disagree with this, as this package adds external dependencies that are only necessary for the map creation. If we're going to merge a "stop-gap" into a development tree, only the minimum necessary should be merged. In this case, that is just the mapping text files themselves, and any code changes that allow their use. This comes with the caveats that: 1) The text file usage code should not be mixed in with the text file creation code 1) New files can be still be created by a single entity when the text files need to change and be merged into the tree as needed (using the unmerged function in this PR)

Let me stress that the way maps are used is out of the scope of this PR

leads me to believe that 1 is already satisfied

lpetre-ulb commented 4 years ago

@lmoureaux, is this PR still relevant for any legacy operation? It seems that the new tools in development are be a better solution to this problem.

lmoureaux commented 4 years ago

I can rebase this PR if needs be merged, but I have no idea if it should be the case.

lpetre-ulb commented 4 years ago

I think this feature will never be used. If ever needed, please feel free re-open the issue.

cms-gem-daq-project / gem-plotting-tools