Closed lmoureaux closed 4 years ago
Also, in keeping with the way the v2 maps are done (and thinking towards the future), prefer not adding the text files
The v2 maps are generated from a functional description of their contents. We don't have this for v3. An alternative would be to include the input csv
files, but they contain the same information (in a less verbose format).
thinking towards the future
What do you have in mind?
Rebased off develop
OK, admittedly, I didn't have time to fully comment on the initial request for this task, but will raise my primary objections here, as I am left to wonder why we go from one set of text based maps, to a python
module that creates a map object from parsing this input file, and then dumps this map object to a text file, that is then read by some other module.
Why not just skip the intermediate step and parse the input file and produce the dict
object and return it to the calling code? This will effectively be how a DB lookup will happen too; we won't be creating an intermediate text file from the DB call.
The interface could then be the same too: one treatment for a type xls/xlsx
file, one treatment for an xml
file, one treatment for a sqlite
file (or DB object)
The input text file is difficult to parse (I needed regex) and error-prone (since it's produced by humans). The tool parses it, cross-checks what can be, and dumps it in a format that's much easier to parse. This also foresees the requirement of C++ compatibility: since .txt
files have already been checked, the C++ code won't have to do it -- less duplication.
DB lookup (or xml
-based maps) will indeed directly result in a Python object; but the DB will also have to be filled with the relevant info. It would be natural to do that in importChannelMap.py
. The database work flow could look like this:
.xlsx
file.csv
fileimportChannelMap.py
I think that having step 2 be done manually is good, because the format of the .xlsx
could change. Also, reading speadsheet files is always a difficult task, even though libraries exist: one has to deal with hidden sheets, merged cells and many other fancy features. And we really don't want to add them to git
.
OK, I see... I was thinking that this tool was directly parsing the input xls/xlsx
file...
So what I would see for the future of this tool, if it is to become the de facto maping parser and DB loader, would be that it is the sole interface to the xls/xlsx
files, and produces the file for DB upload, probably an xml
file, templated as per the requirements of the DB group and interfacing with the DB (but if the DB group are happy with a csv
file, then I don't have a strong objection). It can also also return
the dict
object any python
code that calls it.
Then I would say that the c++
requirement should be simply that it can handle xml/sqlite
files or a DB object only; the only time one needs these djanky hacks is for a teststand that is on the bleeding edge, where python
has a clear advantage
So what I would see for the future of this tool, if it is to become the de facto maping parser and DB loader, would be that it is the sole interface to the xls/xlsx
For this exact goal I think it would have been easier, and also more intuitive to use a tool specifically for parsing xls
and xlsx
files as I mention in my email to the online software mailing list on this specific subject.
Is there a reason that we have not use xlrd
to attempt to parse this excel file? Examples (maybe not the best ones) exist and were circulated, e.g.:
This might be more in line with what @jsturdy is looking for, a direct interface to the xls/xlsx
file.
Architectural changes needed in importChannelMap.py
Quoting https://github.com/cms-gem-daq-project/gem-plotting-tools/pull/134#issuecomment-405578510
[...] I think that having step 2 be done manually is good, because the format of the .xlsx could change. Also, reading speadsheet files is always a difficult task, even though libraries exist: one has to deal with hidden sheets, merged cells and many other fancy features. [...]
And the EDMS document has among others hidden sheets and merged cells.
setup post rpm install to call this script (might require a wget [...])
This will be difficult at P5 (very limited network access). Also, downloading and processing files at install time kind of defeats the purpose of packages. (Note that the files are currently not added to the package, because there's no code that could read them anyway.)
What's the problem with keeping the mappings in the repo? Since they describe the hardware, they shouldn't change too often. The v2b ones are already there, even if in a different form.
move
importChannelMap.py
tomapping
subdirectory and module
If I understand @jsturdy correctly this means being out of $PATH
. How should the man
page be called in this case? Where do I document the calling procedure? Having a documented tool in a "module" subdir is something new.
Please tell me if you think that my concerns are invalid, so that I know what to aim for.
Quoting #134 (comment)
My previous comments stand, please incorporate the xlsx
package. There should not be a two step process but a single step that parses the input spreadsheet and (in future) uploads to a db (now prints the text file.
What's the problem with keeping the mappings in the repo? Since they describe the hardware, they shouldn't change too often. The v2b ones are already there, even if in a different form.
Given the login requirement of EDMS we can, once generated, include the txt files into the repo. However they should be specifically referenced as pertaining to hybrid designs VFAT3 HV3b V2
and VFAT3 HV3b V1
, e.g.:
longChannelMap_VFAT3-HV3b-V1_VFAT3-HV3b-V2.txt
shortChannelMap_VFAT3-HV3b-V1_VFAT3-HV3b-V2.txt
Additionally keep with previous convention and keep importChannelMap.py
in the mapping
subdir.
This will be difficult at P5 (very limited network access). Also, downloading and processing files at install time kind of defeats the purpose of packages. (Note that the files are currently not added to the package, because there's no code that could read them anyway.)
I'm not ever considering using this script as the source of the mappings at P5; by the time these mappings are used there, they should be correctly stored in the DB, and accessed using the reading functionality of some tool.
Given the login requirement of EDMS we can, once generated, include the txt files into the repo. However they should be specifically referenced as pertaining to hybrid designs VFAT3 HV3b V2 and VFAT3 HV3b V1, e.g.:
The extraction tool should also be able to select which mapping it wants to pull from the DB, is this being considered?
and produces the file for DB upload, probably an xml file, templated as per the requirements of the DB group and interfacing with the DB (but if the DB group are happy with a csv file, then I don't have a strong objection). It can also also return the dict object any python code that calls it.
Has any further followup been done with the DB group regarding the format they prefer for loading these maps into the DB? I'd strongly suggest holding this PR until that question is settled, then a template/sample of those maps generated from some fixed version of the mapping could be added to the repo.
Additionally to all that, any tool that would be using the maps would be able to make use of this interface, i.e., reading the DB compatible text (xml
) file (or sqlite
dump), parsing it into a mapping object, and returning that object to the using code.
At the end of the day, I strongly prefer a centrally maintained location for official read-only maps, to prevent the proliferation of mappings (a concern of @bdorney): in the future, the DB, until that time, a static location, generated by a specific person using this code, rather than in the repository itself.
xlsx
directly. The xlsx
reader is fragile but, since the format wasn't designed to be used this way in the first place, any reader will be.Let me stress that the way maps are used is out of the scope of this PR. importChannelMaps
supports the current way; updating it later to output the data in any other format is easy.
I put this on hold until there's a clear agreement between @bdorney and @jsturdy on what should be done.
I'm still waiting for input from the maintainers.
For the time being I think the text files provided here are suitable for the short term but we should consider the long term DB interface/loader, and DB query to get the mapping. And that may not be appropriate for this repo. I’d like @jsturdy to set the direction here as it’s clear he has more experience on how to implement/use this in CMS than I do presently.
We’ve been using the text files you created from the EDMS document to try to investigate if the mapping is correct with radioactive sources. So this has been useful exercise. Just before we rush into further development we should just ensure we are on the right path (@jsturdy)
For lack of additional feedback and the need for a "stop-gap" before the DB is needed could you rebase this off of the HEAD
of develop
? Then we will merge it.
Then we will merge it.
I would disagree with this, as this package adds external dependencies that are only necessary for the map creation. If we're going to merge a "stop-gap" into a development tree, only the minimum necessary should be merged. In this case, that is just the mapping text files themselves, and any code changes that allow their use. This comes with the caveats that: 1) The text file usage code should not be mixed in with the text file creation code 1) New files can be still be created by a single entity when the text files need to change and be merged into the tree as needed (using the unmerged function in this PR)
Let me stress that the way maps are used is out of the scope of this PR
leads me to believe that 1 is already satisfied
@lmoureaux, is this PR still relevant for any legacy operation? It seems that the new tools in development are be a better solution to this problem.
I can rebase this PR if needs be merged, but I have no idea if it should be the case.
I think this feature will never be used. If ever needed, please feel free re-open the issue.
Description
This PR adds to the repo:
Types of changes
Motivation and Context
This feature was first requested in issue #90. Further discussion happened on the mailing list.
How Has This Been Tested?
The import tool includes the following cross-checks:
[0,128)
(even though the first is the identity mapping). That is, every channel is mapped to a different strip and every strip has a channel mapped to it.Checklist: