Drastic simulation runtime diffferences seen with identical particle representations

donish-getstand commented 2 months ago

When defining particles in simulations, the method of specifying their arrangement can impact runtime efficiency, even if the fundamental particle setup remains constant. For example, consider two scenarios where the particles are defined to fill a specific volume, yet the methods used for defining their spatial distribution differ:

Scenario 1: Define N_PARTICLES_PER_CELL=1 and create 27 particles using an XB value of (XB=48.75,49.0,-25.75,-25.5,4.25,4.5). This configuration arranges the particles in a grid with dimensions equivalent to 1x1x1 cells.

Scenario 2: Define N_PARTICLES_PER_CELL=1 and create the same 27 particles using an XB value of (XB=48.25,49.0,-26.0,-25.5,3.75,4.5). In this case, the particles are organized within a 3x3x3 supercell.

Even though both scenarios involve the same total number of Lagrangian particles (one per cell), they lead to different simulation runtimes. This has a large impact on simulation where lots a vegetation has been simulated.

Can anyone provide insight as to why this is happening? Thanks in advance!

mcgratta commented 2 months ago

You need to provide a simple test case that demonstrates this. I cannot think of a reason why this would be so.

donish-getstand commented 2 months ago

Sure! I have upload two relatively simple files (the are zipped in the attachment) that should demonstrate this effect. Please let me know if you have anymore questions. Archive.zip

mcgratta commented 2 months ago

I am running the cases now. They are not "identical" because the number of particles in each grid cell is not exactly the same. Similar, but not the same. Post a plot of the heat release rate curve for each case, preferably on the same graph.

mcgratta commented 2 months ago

I am seeing this after running both cases. Can you make a similar plot by comparing the first row of the _cpu.csv files.

mcgratta commented 2 months ago

The number of time steps of both cases is comparable. The HRR and time steps are comparable. There is a just more CPU usage across the board for the 1x1x1 case. I don't have an explanation at the moment. I have to look more carefully.

ericvmueller commented 2 months ago

@donish-getstand, if your timings look much different from Kevin's it might be worth considering what version of the code you are using.

The changes around issues #12673 and #13224 were aimed at preventing unnecessary additional particles (and associated computational cost) with many INIT regions like you have. Might not be the case here but this issue came to mind.

mcgratta commented 2 months ago

I am running the latest source. There is still something puzzling about how the time spent in DIVG, MASS, VELO, etc, is higher for the case with more INIT lines. Also, we spend more time in part.f90 than I would expect, even for meshes with no particles. I suspect we still process the INIT lines to some extent.

mcgratta commented 2 months ago

When I rerun these jobs exclusively; that is, each job gets an entire 64 core node to itself, here are the timings:

My conclusion is that there is a 30 s penalty in PART for case 1x1x1 because you have approximately 11000 INIT lines. In my initial run of these cases, both jobs shared a node and competed for access to memory. In this latter case, each job had its own exclusive access to memory.

Repeat this experiment with a fairly recent version of FDS and let me know if you see the same thing.

donish-getstand commented 2 months ago

Thanks for the feedback!

@ericvmueller: I am using fairly recent releases of FDS (6.9.0 and 6.9.1).

@mcgratta: Here are the cpu-time plots. The first one is results for the same exact files I posted here (that were parsed for simplicity from their original full simualtion files) and ran using FDS-6.9.1 on my laptop locally (macOS, 14.6.1 (23G93)). I would say the results look differently than yours; WALL, DUMP & COMM columns.

The second one is our full-scale simulation files (with many more particle definitions and thus, more time-related impact) ran using a high-performance computing cluster using FDS-6.9.0.

The last column (Total Time Used (s)) for the files I posted here are pretty similar: 1x1x1: 18720s 3x3x3: 18690s But, for our full simulations they are much more different: 1x1x1: 6992s 3x3x3: 5193s

Also, @mcgratta could you clarify what you meant by the particle distributions being similar but not identical. As far I could tell our definitions differences between 1x1x1 and 3x3x3 should be null. Thanks in advance!

mcgratta commented 2 months ago

Close, but not exactly the same in either number per cell or particle position. This should not have huge effect, but it will have some. However, I think the difference in run time, at least for my computer and MPI configuration, is the number of INIT lines.

ericvmueller commented 2 months ago

@donish-getstand if your case is affected by the issue I mentioned, that would cause a difference between 6.9.0 and 6.9.1 as it was fixed between these releases. Is it possible to test your full simulations with 6.9.1 as well?

donish-getstand commented 1 month ago

Hi All @ericvmueller @mcgratta , I know this issue is closed but I wanted to update the thread with more information. Per @ericvmueller suggestion I ran the ('full') simulations in both FDS 6.9.1 and FDS 6.9.0. The simulations for the 1x1x1 and 3x3x3 cases using FDS 6.9.1 have finished with somewhat improved results: First, notice that using FDS 6.9.1 actually increases the 3x3x3 case (orange) and decreases the 1x1x1 case (blue) total computation time:

1x1x1 "Total Time Used" For FDS 6.9.1: 115 minutes

1x1x1 "Total Time Used" For FDS 6.9.0: 117 minutes

3x3x3 "Total Time Used" For FDS 6.9.1: 98 minutes

3x3x3 "Total Time Used" For FDS 6.9.0: 87 minutes

There is an advantage in computation time/efficiency in defining the environment using less lines of code, but the difference in computation times from using the different FDS versions is surprising. The overall absolute difference between the two cases (blue vs orange) is smaller for numerous columns (DIVG, RADI, COMM, and TOTAL_T). The absolute difference in the TOTAL_T column between the two cases for each FDS version is:

Absolute Difference Of "Total Time Used" For FDS 6.9.1: 16.5 minutes

Absolute Difference Of "Total Time Used" For FDS 6.9.0: 30.0 minutes

There is roughly a factor 2 improvement of the difference in total time used between each case (i.e. the agreement between the blue and orange simulation set ups) in the FDS 6.9.1 data.

mcgratta commented 1 month ago

Thanks for the update.

firemodels / fds

Drastic simulation runtime diffferences seen with identical particle representations #13373