Open sroychow opened 3 years ago
A new Issue was created by @sroychow Suvankar Roy Chowdhury.
@Dr15Jones, @perrotta, @dpiparo, @makortel, @smuzaffar, @qliphy can you please review it and eventually sign/assign? Thanks.
cms-bot commands are listed here
assign dqm, heterogeneous
New categories assigned: heterogeneous,dqm
@jfernan2,@ahmad3213,@rvenditti,@fwyzard,@emanueleusai,@makortel,@pbo0,@pmandrik you have been requested to review this Pull request/Issue and eventually sign? Thanks
@mmusich currently I don't think that is possible.
The reason is that the SwitchProducer
does not run anything by itself, it simply "aliases" one of its branches to its name; the "alias" is then run when its products are requested by some other module.
Since an EDAnalyzer
does not produce anything, there would be nothing triggering its execution.
You should be able to achieve the same effect simply with
monitorpixelTrackSoA = DQMEDAnalyzer('SiPixelPhase1MonitorTrackSoA',
pixelTrackSrc = cms.InputTag("pixelTracksSoA"),
TopFolderName = cms.string("SiPixelHeterogeneous/PixelTrackSoA"),
)
The pixelTracksSoA
SwitchProducer will pick pixelTracksSoA@cpu
or pixelTracksSoA@cuda
automatically.
However, with this approach how do you disentangle what has been running on the CPU and on the GPU ?
@fwyzard aren't all DQM modules EDProducer
s: https://github.com/cms-sw/cmssw/blob/master/DQMServices/Core/README.md ?
Ah, good point, I'm stuck to the pre-transition approach based on EDAnalyzer
s.
However, with this approach how do you disentangle what has been running on the CPU and on the GPU ?
It shouldn't but perhaps we still haven't got the gist of what is requested.
I thought we would be submitting relvals with and without the gpu
modifier and compare the products to validate that CPU SoA-based reco gives same results as GPU one.
Then this
monitorpixelTrackSoA = SwitchProducerCUDA(
cpu = DQMEDAnalyzer('SiPixelPhase1MonitorTrackSoA',
pixelTrackSrc = cms.InputTag("pixelTracksSoA@cpu"),
TopFolderName = cms.string("SiPixelHeterogeneous/PixelTrackSoA")
),
cuda = DQMEDAnalyzer('SiPixelPhase1MonitorTrackSoA',
pixelTrackSrc = cms.InputTag("pixelTracksSoA@cuda"),
TopFolderName = cms.string("SiPixelHeterogeneous/PixelTrackSoA")
)
)
should work in principle.
Still
pixelTracksSoA
collection ?It shouldn't but perhaps we still haven't got the gist of what is requested.
Sorry, you wrote
Given that we want to run this at HLT
so I assumed you meant inside the HLT running online.
Sorry, you wrote
first, I didn't wrote it, I am not the author of the issue :)
so I assumed you meant inside the HLT running online.
No, we're trying to address requests from PPD/ TSG about CPU/GPU validation.
Sorry, you wrote
first, I didn't wrote it, I am not the author of the issue :)
Wops, sorry ...
so I assumed you meant inside the HLT running online.
No, we're trying to address requests from PPD/ TSG about CPU/GPU validation.
Ah, OK. I'm afraid I don't know what is the request, I'll have to understand that first.
No, we're trying to address requests from PPD/ TSG about CPU/GPU validation.
Ah, OK. I'm afraid I don't know what is the request, I'll have to understand that first.
I think too that the details of what exactly is being requested would be crucial to figure out the best course of action. In the leading order,
- why would you prefer this to just reading the
pixelTracksSoA
collection ?
just reading the pixelTracksSoA
should be the way to go. That is an EDAlias to either pixelTracksSoA@cpu
or pixelTracksSoA@cuda
depending which one was triggered to be run by the SwitchProducerCUDA
.
SiPixelPhase1MonitorTrackSoA
will anyhow run on host so it shall just consume pixelTracksSoA
(or hltPixelTracksSoA
) and will get whatever produced for that event. (And given that this is done in the same process, and the decision to run on GPU or CPU at the moment is taken at process level, should not be difficult to flag it cpu or gpu
I suspect that this should work
monitorpixelTrackSoA = SwitchProducerCUDA(
cpu = DQMEDAnalyzer('SiPixelPhase1MonitorTrackSoA',
pixelTrackSrc = cms.InputTag("pixelTracksSoA"),
TopFolderName = cms.string("SiPixelHeterogeneousOnCPU/PixelTrackSoA")
),
cuda = DQMEDAnalyzer('SiPixelPhase1MonitorTrackSoA',
pixelTrackSrc = cms.InputTag("pixelTracksSoA"),
TopFolderName = cms.string("SiPixelHeterogeneousOnGPU/PixelTrackSoA")
)
)
cuda does not mean: run on cuda
In ECAL we have been looking at similar things recently and tried the SwitchProducerCUDA
as well to change DQM configurations. What we had issues with was when a downstream config file wanted to change a parameter regardless of if the cpu
or cuda
branch were used. For example if in the example some config file would modify pixelTrackSrc
to a new input tag. In that case monitorpixelTrackSoA.pixelTrackSrc = cms.InputTag("otherPixelTracksSoA")
does not work since monitorpixelTrackSoA
is a SwitchProducerCUDA
and not the DQMEDAnalyzer('SiPixelPhase1MonitorTrackSoA')
that the downstream configuration might expect it to be. Is there a way to achieve this without explicitly pointing to the cpu
or cuda
cases?
@thomreis I am not sure I understand your issue very well. Commenting from my experience, for modules aimed at monitoring collections, you should use the same tag for a GPU and a CPU workflow(e.g. pixelTrackSoA
). Please have a look at the comment above from Matti.
Coming to the modules which will do comparisons between collections produced on the GPU or CPU, you should ask for two collections(e.g. pixelTracksSoA@cpu
and pixelTracksSoA@cuda
). The workflow should be setup in such a way that both collections are available.
Hi @sroychow that was in fact what I meant when I said I want to change a parameter regardless if the CPU or GPU case of the SwitchProducer is used. In my example the SwitchProducer would be defined in some cff file with the InputTag for both, the cpu
and cuda
, cases being pixelTrackSoA
. That cff is then loaded in a cfg file where then the InputTag for monitorpixelTrackSoA
is changed to a different one (e.g. otherPixelTracksSoA
). The change in the cfg file should be happening for whichever of the two cases in the SwitchProducer is used without explicitly naming one or both.
Is there a way to achieve this without explicitly pointing to the
cpu
orcuda
cases?
Any later customizations of SwitchProducer
s must use the cases explicitly. But you can make a loop along
s = SwitchProducerCUDA(cpu=..., cuda=...)
for case in s.parameterNames_():
getattr(s, case).parameter = "new value"
I'd actually like to understand better the use cases for different code/configuration for CPU and CUDA in DQM. The SwitchProducer-based infrastructure assumes that the downstream consumers do not care where exactly their input data products were produced. For example, if the producer and consumer were in different jobs, this SwitchProducer-based approach would not work in general.
Could you tell more what exactly you intend to be different for the CPU vs GPU -produced input data? (e.g. https://github.com/cms-sw/cmssw/issues/35879#issuecomment-954847378 suggests for different folders for histograms)
Could you tell more what exactly you intend to be different for the CPU vs GPU -produced input data? (e.g. #35879 (comment) suggests for different folders for histograms)
@makortel if my understanding is correct, the different folders are for the output of the histograms in the DQM root file coming from the DQM modules, in order to distinguish the monitoring of the CPU vs GPU collections . But nothing different is expected from the input collections.
For ECAL we want to make event-by-event CPU vs. GPU comparisons plots. That requires both input collections but that part of the DQM module should only run on GPU machines (and only on a subset of events obviously because otherwise there would be no point in reconstructing on GPUs in the first place).
So the current idea is that on a CPU-only machine the default ECAL DQM would run and on a GPU machine the default ECAL DQM plus the CPU vs. GPU comparison task.
The ECAL DQM uses workers to do the different tasks and for the GPU comparison an additional worker would be added to the list of workers to be run (https://github.com/cms-sw/cmssw/pull/35946/files#diff-a56670e09d76281c92bd7bd09a0316c5db75f7e23ea20c7a67b1ddabb2bd4dd8R33). So in the end, on a GPU machine we would need to modify the configuration of the ecalMonitorTask
module.
One thing we tried was a cff file with the following but with that we had the issue I described earlier:
import FWCore.ParameterSet.Config as cms
from HeterogeneousCore.CUDACore.SwitchProducerCUDA import SwitchProducerCUDA
from DQM.EcalMonitorTasks.EcalMonitorTask_cfi import ecalMonitorTask as _ecalMonitorTask
ecalMonitorTask = SwitchProducerCUDA(
cpu = _ecalMonitorTask.clone()
)
# Customization to run the CPU vs GPU comparison task if the job runs on a GPU enabled machine
from Configuration.ProcessModifiers.gpu_cff import gpu
from DQM.EcalMonitorTasks.GpuTask_cfi import ecalGpuTask
gpu.toModify(ecalMonitorTask,
cuda = _ecalMonitorTask.clone(workerParameters = dict(GpuTask = ecalGpuTask)))
gpu.toModify(ecalMonitorTask.cuda.workers, func = lambda workers: workers.append("GpuTask"))
I would also like to point out this nice talk by A. Bocci about GPU in DQM listing all the possibilities: https://indico.cern.ch/event/975162/contributions/4106441/
disclaimer n. 1: I've typed all this directly on GitHub without testing any of it - hopefully I didn't make many mistakes, but don't expect this to be 100% error-free
disclaimer n. 2: names are not my forte; if you find better names for what I suggest, please, go ahead with them !
I think that the complexity here comes from the fact that we want to have a single workflow configuration that does different things (two different set of validation plots) depending if a GPU is available or not:
IMHO this is not something that should be handled "automatically" by the presence or absence of a GPU, but at the level of the definition of the workflow. So, we should have two workflows
Then we could (try to) run each workflow on a different machine:
Then
(*) depending what we think should be the behaviour of the workflow
The bottom line is, I would now try to find a technical solution for this problem, because it should have a different definition altogether. So I would suggest to disentangle the two things: running on GPU, and doing the GPU-vs-CPU validation.
The current behaviour in the cmsDriver
workflows is
gpu
modifier is not given, nothing runs on a GPU - even if one is available;gpu
modifier is given, make use of a GPU if one is available, fall back to CPU otherwise.So, running without the gpu
modifier and running with the gpu
modifier on a machine without GPUs should run the exact same modules and configuration. I would suggest to keep things like that.
Then we can add a second modifier (e.g. gpu_validation
) to ask the DQM modules to read both CPU-build and GPU-built collections explicitly, and make any relevant comparisons.
Let me try to give some made-up examples...
Let's say the original configuration was
monitorStuff = DQMEDAnalyzer('MonitorStuff',
src = cms.InputTag('someStuff'),
folder = cms.string('DQM/Folder/Stuff')
)
stuffValidationTask = cms.Task(monitorStuff)
Once GPUs are involved, we have three options
Assuming that someStuff
is the result of a SwitchProducer
, 1. is easy: we don't have to do anything, just use the monitorStuff
module as is.
To achieve 2. we have two options.
If someStuff
is the result of a SwitchProducer
, this should already do the right thing:
monitorStuff = SwitchProducerCUDA(
cpu = DQMEDAnalyzer('MonitorStuff',
src = cms.InputTag('someStuff'),
folder = cms.string('DQM/Folder/Stuff')
),
cuda = DQMEDAnalyzer('MonitorStuff',
src = cms.InputTag('someStuff'),
folder = cms.string('DQM/Folder/StuffOnGPU')
)
)
I would be great if somebody could actually test it and let us know if it works :-)
If the collections being monitored are not from a SwitchProducer
, or if we just want to make things more explicit, different InputTags can be used:
monitorStuff = SwitchProducerCUDA(
cpu = DQMEDAnalyzer('MonitorStuff',
src = cms.InputTag('someStuffOnCPU'), # or 'someStuff@cpu'
folder = cms.string('DQM/Folder/Stuff')
),
cuda = DQMEDAnalyzer('MonitorStuff',
src = cms.InputTag('someStuffOnGPU'), # or 'someStuff@cuda'
folder = cms.string('DQM/Folder/StuffOnGPU')
)
)
Finally, 3. is just a variation of the last option:
monitorStuff = DQMEDAnalyzer('MonitorStuff',
src = cms.InputTag('someStuff@cpu'),
folder = cms.string('DQM/Folder/Stuff')
)
monitorStuffOnGPU = DQMEDAnalyzer('MonitorStuff',
src = cms.InputTag('someStuff@cuda'),
folder = cms.string('DQM/Folder/StuffOnGPU')
)
The configuration for 2. (either options, though the first one is simpler) or 3. can be generated from the configuration of 1. with an appropriate modifier.
For 2. (first):
_monitorStuff = monitorStuff.clone()
monitorStuff = SwitchProducerCUDA(
cpu = _monitorStuff.clone()
)
gpu_validation.toModify(monitorStuff,
cuda = _monitorStuff.clone(
src = 'someStuff',
folder = 'DQM/Folder/StuffOnGPU'
)
)
While for 3. a new module needs to be added to a Task or Sequence:
gpu_validation.toModify(monitorStuff,
src = 'someStuff@cpu'
)
monitorStuffOnGPU = monitorStuff.clone(
src = 'someStuff@cuda',
folder = 'DQM/Folder/StuffOnGPU'
)
_stuffValidationTask_gpu = stuffValidationTask.copy()
_stuffValidationTask_gpu.add(monitorStuffOnGPU)
gpu_validation.toReplaceWith(stuffValidationTask, _stuffValidationTask_gpu)
If we have a single DQM module that can do both the traditional validation, and the GPU-vs-CPU comparison, we have few options.
The configuration for performing only the traditional validation could be:
monitorAndCompareStuff = DQMEDAnalyzer("MonitorAndCompareStuff",
reference = cms.InputTag('someStuff'),
target = cms.InputTag('') # leave empty not to do any comparison
)
As in the previous example, if someStuff
is the result of a SwitchProducer
, this will validate the CPU or the GPU version of the reconstruction and put the results in a single folder. To use different folders we can adapt the previous solutions.
The configuration for performing the traditional validation and the GPU-vs-CPU comparison could be
monitorAndCompareStuff = DQMEDAnalyzer("MonitorAndCompareStuff",
reference = cms.InputTag('someStuff@cpu'),
target = cms.InputTag('someStuff@cuda')
)
Whether the target
collection is used only for the comparison, or also for (a subset of) the traditional validation, is up to the DQM module itself. The same applies the the folder being used for the plots; for example, additional folders could be configured via python, or they could be hardcoded in C++, etc.
Also in this case, the second configuration could be generated starting from the first by an appropriate modifier:
gpu_validation.toModify(monitorAndCompareStuff,
reference = 'someStuff@cpu',
target = 'someStuff@cuda'
)
I have tried to implement something along Andrea's comments (https://github.com/cms-sw/cmssw/compare/master...thomreis:ecal-dqm-addGpuTask?expand=1), based on GPU vs. CPU comparison code from @alejands in PR #35946, but the matrix tests mostly fail with an exception (running on lxplus without GPU):
----- Begin Fatal Exception 25-Nov-2021 20:39:43 CET-----------------------
An exception of category 'UnimplementedFeature' occurred while
[0] Constructing the EventProcessor
[1] Constructing module: class=EcalDQMonitorTask label='ecalMonitorTask@cpu'
Exception Message:
SwitchProducer does not support non-event branches. Got Run for SwitchProducer with label ecalMonitorTask whose chosen case is ecalMonitorTask@cpu.
----- End Fatal Exception -------------------------------------------------
I am not quite sure what that means and if there would be a way to change EcaDQMonitorTask to be compliant (if it is actually EcaDQMonitorTask that causes this).
Note that if the gpu_validation
modifier will be given only for GPU machines what we want could probably be done without using a SwitchProducerCUDA
.
Hi Thomas, thanks for the test.
Can I ask what behaviour you are trying to achieve ?
Anyway, it looks like we (currently) cannot use the SwitchProducer for a DQMEDAnalyzer.
Matti, is this something you think should be added ? Or do we look for a different solution ?
If the gpu-validation
modifier not given only the normal ECAL DQM tasks should run.
If gpu-validation
is given and there is no GPU then also run the normal ECAL DQM tasks.
If gpu-validation
is given and there is a GPU then run the normal ECAL DQM tasks and also the ECAL GPU vs. CPU comparison. Since the comparison consumes the @cpu
and the @cuda
inputTags it should force the framework to execute the CPU and the GPU algorithms.
Of course there is in principle no need to give the gpu-validation
modifier if there is no GPU for which the results need to be validated, so as mentioned before I think the SwitchProduceris not really needed in this case if
gpu-validation` is only given on GPU machines (manually or by some other mechanism of the DQM deployment).
Could you elaborate a bit more what the DQMEDAnalyzer does that prevents its use within a SwitchProducer?
If
gpu-validation
is given and there is no GPU then also run the normal ECAL DQM tasks.
IMHO this should actually crash: you are explicitly asking to run something on GPU when one is not present.
Could you elaborate a bit more what the DQMEDAnalyzer does that prevents its use within a SwitchProducer?
It looks like the SwitchProducer
works for event data products, and not for lumisection or run data products.
It looks like the DQMEDAnalyzer
is an EDProducer
of the latter type, producing only lumisection or run data products.
In principle - with the current approach where the "branch" choseb by the SwitchProducer
is done once and for all at the beginning of the job - it should be possible to make the SwitchProducer
work also for lumisection and run data products.
However, if in the future we plan to make it possible to switch event-by-event, this would probably break.
I see. So the edm::Transition::EndRun
for the produce()
is a problem in this case. https://github.com/cms-sw/cmssw/blob/master/DQMServices/Core/interface/DQMEDAnalyzer.h#L57
IMHO this should actually crash: you are explicitly asking to run something on GPU when one is not present.
If gpu-validation
is only given on GPU machines then I would actually drop the SwitchProducer and just use toModify
to add the additional task to the module configuration. So I guess in that case it would crash if the modifier is given when there is no GPU.
I would actually drop the SwitchProducer and just use
toModify
to add the additional task to the module configuration. So I guess in that case it would crash if the modifier is given when there is no GPU.
👍
So I would suggest to disentangle the two things: running on GPU, and doing the GPU-vs-CPU validation.
I fully agree.
Matti, is this something you think should be added ? Or do we look for a different solution ?
It seems to me that all the presented use cases so far are really about knowledge of whether the data product was produced on CPU or GPU. The SwitchProducer
feels suboptimal solution for that (e.g. won't work in general across jobs). So I would think of a different solution (possibly based on provenance).
Hi, @makortel I'm not sure what the action plan is to get this fixed, should we have a discussion or is it not necessary?
I understood @thomreis found a different solution for his use case ("Make DQM plots of GPU-vs-CPU reconstructed quantities" in Andrea's https://github.com/cms-sw/cmssw/issues/35879#issuecomment-979200772).
The exact use case of the issue description (Andrea's option 1, "same folder for CPU and GPU quantities", in https://github.com/cms-sw/cmssw/issues/35879#issuecomment-979200772) works out of the box.
For the use case of Vincenzo in https://github.com/cms-sw/cmssw/issues/35879#issuecomment-954847378 (Andrea's option 2, "different folder for CPU and GPU quantities, fill only one of those in a job", in https://github.com/cms-sw/cmssw/issues/35879#issuecomment-979200772), we are going to implement something like Provenance telling if a data product was produced on a CPU or a GPU (actual implementation will likely be different, but I hope this gives the idea).
Andrea's option 3, "different folder for CPU and GPU quantities, fill both in a job", in https://github.com/cms-sw/cmssw/issues/35879#issuecomment-979200772 would be best implemented with a specific Modifier (as Andrea wrote).
For the use case of Vincenzo in #35879 (comment) (Andrea's option 2, "different folder for CPU and GPU quantities, fill only one of those in a job", in #35879 (comment)), we are going to implement something like Provenance telling if a data product was produced on a CPU or a GPU (actual implementation will likely be different, but I hope this gives the idea).
Just to add that the this use case can technically be implemented already today by using the information stored in event provenance. For example, for an event product pixelTracksSoA
"produced" by SwitchProducerCUDA the provenance shows that the parent of the product has a label of either pixelTracksSoA@cpu
or pixelTracksSoA@cuda
, that could be used to distinguish where it was produced. Although this works only if the cpu/cuda case in the SwitchProducer is an EDProducer, if it is an EDAlias, the parent of the pixelTrackSoA
points to the actual EDProducer that produced the aliased-for product, that, in general, can have any module label. The same is true also if one wants to inspect any further parent of the pixelTracksSoA
product, and one basically has to know the EDProducer C++ types to know what happened.
While it can be done, this model doesn't scale well for many uses or evolving configuration. Therefore we're planning to introduce a simpler record at process level along "whether GPU offloading was enabled or not". Some more details are in #30044 (where any feedback on that approach would be welcome).
From Tracker DQM side, we are developing DQM modules to monitor hlt products(e.g. pixel Tracks, Vertices in SoA) which can either be produced on a GPU or a CPU. Right now in out tests, the same module is modified with the
gpu
modifier to use the correct product in a GPU wf. Example is :-Given that we want to run this at HLT, I wanted to understand if we can have a
SwitchProducer
mechanism forDQMEDAnalyzer
so that we can do something like this:-Can FrameWork experts give some guidance on this? @arossi83 @mmusich @tsusa @connorpa