Closed YifanC closed 4 months ago
For a MiniRun5 file, you would still get a lot more segments than 50 contributing to a pixel
Do you have a contributions/pixel distribution we can take a look at to understand where we should set this? I was under the impression we already tuned this to a reasonable value. I sit the NuMI beam simulation where the >50 is coming from, or from particle bombs for mlreco training?
Unfortunately, I have seen many many many pixels have more than 50 segments contributing to the signal using MR5 0666 edep input. (The default is set to 20 for MR3-MR5 beta1, as far as I know.)
Tuning MAX_TRACKS_PER_PIXEL
and MAX_ADC_VALUES
is not good long term solution. Kazu proposed to rank the contribution in the waveform level before we decide which to keep, and it seems like he is willing to implement it. This could be a solution.
Shall we set MAX_TRACKS_PER_PIXEL
and MAX_ADC_VALUES
to 20 and 20, as before?
I'm seeing normal (~20-minute) run times with MAX_TRACKS_PER_PIXEL = 50
and MAX_ADC_VALUES = 30
, although I've disabled the warning printout, as it was really spammy and may have contributed to the slowdown you reported. I also can't see how any of this is related to the light simulation.
Yea.. I guess the print out was not a good idea = = In fact, with the print out, with MAX_TRACKS_PER_PIXEL = 20
and MAX_ADC_VALUES = 20
it would jam the run. Shall we do a silent kill (remove the print out completely) and maybe even set it to MAX_TRACKS_PER_PIXEL = 80
and MAX_ADC_VALUES = 30
if the memory withhold
It's still good to have a warning. I've moved the printing outside the kernel. Will see what happens with MAX_TRACKS_PER_PIXEL = 80
.
Is there any evidence that MAX_TRACKS_PER_PIXEL
of 80 preserves significantly more information than 50? In both cases I just see one spill (per module) producing a warning. However, going from 50 to 80 increases the run time from 23 to 28 minutes. I didn't see a substantial difference in peak GPU memory usage (~77GB by eye in both cases... scary).
With MR5 0666, I have seen multiple incidences that it's above 50 contributors despite my initial warning wasn't ideal. I think 77GB memory usage is on the edge... and has huge limitation on which GPU you can run on. (Although when I tested I was using a 40GB memory A100... but I didn't actively monitor the memory use) How much is the memory usage for MAX_TRACKS_PER_PIXEL = 20
and MAX_ADC_VALUES = 20
? Probably bump it from MAX_TRACKS_PER_PIXEL
from 50 to 80 is unnecessary. The question is whether we keep it at a lower number.
@mjkramer Are you okay to merge this in develop
?
I see no difference in run time (24 minutes) or GPU memory (51 GB) between (20, 20) and (50, 30) for MAX_TRACKS_PER_PIXEL
and MAX_ADC_VALUES
.
MAX_TRACKS_PER_PIXEL
from 20 to 50, andMAX_ADC_VALUES
from 20 to 30. For a MiniRun5 file, you would still get a lot more segments than 50 contributing to a pixel.The run time is ~50min (I'm not sure if it's due to
MAX_TRACKS_PER_PIXEL
andMAX_ADC_VALUES
). Without light simulation it's ~30min.This PR needs the change in PR #229. (I could merge then PR... if it's easier...) Bug identified by Kazu