DUNE / larnd-sim

Simulation framework for a pixelated Liquid Argon TPC
Apache License 2.0
10 stars 28 forks source link

Feature signal indexing #230

Closed YifanC closed 4 months ago

YifanC commented 4 months ago
  1. Corrected an indexing error in sum_pixel_signals
  2. Increased MAX_TRACKS_PER_PIXEL from 20 to 50, and MAX_ADC_VALUES from 20 to 30. For a MiniRun5 file, you would still get a lot more segments than 50 contributing to a pixel.

The run time is ~50min (I'm not sure if it's due to MAX_TRACKS_PER_PIXEL and MAX_ADC_VALUES). Without light simulation it's ~30min.

This PR needs the change in PR #229. (I could merge then PR... if it's easier...) Bug identified by Kazu

krwood commented 4 months ago

For a MiniRun5 file, you would still get a lot more segments than 50 contributing to a pixel

Do you have a contributions/pixel distribution we can take a look at to understand where we should set this? I was under the impression we already tuned this to a reasonable value. I sit the NuMI beam simulation where the >50 is coming from, or from particle bombs for mlreco training?

YifanC commented 4 months ago

Unfortunately, I have seen many many many pixels have more than 50 segments contributing to the signal using MR5 0666 edep input. (The default is set to 20 for MR3-MR5 beta1, as far as I know.) Tuning MAX_TRACKS_PER_PIXEL and MAX_ADC_VALUES is not good long term solution. Kazu proposed to rank the contribution in the waveform level before we decide which to keep, and it seems like he is willing to implement it. This could be a solution.

YifanC commented 4 months ago

Shall we set MAX_TRACKS_PER_PIXEL and MAX_ADC_VALUES to 20 and 20, as before?

mjkramer commented 4 months ago

I'm seeing normal (~20-minute) run times with MAX_TRACKS_PER_PIXEL = 50 and MAX_ADC_VALUES = 30, although I've disabled the warning printout, as it was really spammy and may have contributed to the slowdown you reported. I also can't see how any of this is related to the light simulation.

YifanC commented 4 months ago

Yea.. I guess the print out was not a good idea = = In fact, with the print out, with MAX_TRACKS_PER_PIXEL = 20 and MAX_ADC_VALUES = 20 it would jam the run. Shall we do a silent kill (remove the print out completely) and maybe even set it to MAX_TRACKS_PER_PIXEL = 80 and MAX_ADC_VALUES = 30 if the memory withhold

mjkramer commented 4 months ago

It's still good to have a warning. I've moved the printing outside the kernel. Will see what happens with MAX_TRACKS_PER_PIXEL = 80.

mjkramer commented 4 months ago

Is there any evidence that MAX_TRACKS_PER_PIXEL of 80 preserves significantly more information than 50? In both cases I just see one spill (per module) producing a warning. However, going from 50 to 80 increases the run time from 23 to 28 minutes. I didn't see a substantial difference in peak GPU memory usage (~77GB by eye in both cases... scary).

YifanC commented 4 months ago

With MR5 0666, I have seen multiple incidences that it's above 50 contributors despite my initial warning wasn't ideal. I think 77GB memory usage is on the edge... and has huge limitation on which GPU you can run on. (Although when I tested I was using a 40GB memory A100... but I didn't actively monitor the memory use) How much is the memory usage for MAX_TRACKS_PER_PIXEL = 20 and MAX_ADC_VALUES = 20? Probably bump it from MAX_TRACKS_PER_PIXEL from 50 to 80 is unnecessary. The question is whether we keep it at a lower number.

YifanC commented 4 months ago

@mjkramer Are you okay to merge this in develop?

mjkramer commented 4 months ago

I see no difference in run time (24 minutes) or GPU memory (51 GB) between (20, 20) and (50, 30) for MAX_TRACKS_PER_PIXEL and MAX_ADC_VALUES.