QF-Error-Tracking / QFVD5

0 stars 0 forks source link

Troubleshooting Help: QF-Running Slower on Server Using Linux Kernel 4.18.0/gfortran: 8.5.0 #20

Closed zacharycope0 closed 1 year ago

zacharycope0 commented 1 year ago

Describe the Issue I'm using a Linux server to run 1x2km QF sims. On the server, my simulations are running roughly 1/10 the speed of running the same simulation on Windows Subsystem Linux (wsl). The server and wsl have different OSs, Kernels, and versions of gfortran (see below). The QF versions aren't a problem. I've run tests with both 511 and 523.

Please let me know if you have any troubleshooting advice or if you have seen similar problems before.

Server System Info (Running 1/10 the speed)

Windows Subsystem Linux System Info (Benchmark for speed)

sbrambilla commented 1 year ago

@zacharycope0 I'm not sure how a WSL works to answer that. Try looking into the timelog.log file and check if any of the functions tracked is taking significantly different on the two systems or if the code is just slower in general.

Unrelated: I would update gfortran to a newer version, I'm using 11.3.0.

zacharycope0 commented 1 year ago

@sbrambilla Ok will do. To answer your question, WSL is similar to a Linux virtual machine, but the Window OS and WSL share the same file system and hardware.

JayCh1 commented 1 year ago

I just did a WSL vs native linux test on my laptop and a remote server. I'm seeing similar performance on both systems.

I can think of two possible explanations for the differences Zach is seeing:

1) differences in processor speeds on the laptop vs native Linux server (I'd be surprised if this is a factor of 10, but it could explain some of the difference).

2) the type of harddrive to which the output is being saved. My WSL is running off an SSD on my laptop. On the Michigan State HPCC Linux system I have access to two different storage locations: 1) a scratch drive that's designed for quicker read/write (not certain if this is an SSD, but it behaves like one), and 2) a longer-term storage drive. I didn't realize the difference between these locations at first, so when I started running QF on the MSU HPCC I used the long-term storage drive. When I moved to the scratch drive I saw a factor of 3-4 increase in speed. On one of the other native Linux servers in my office that uses an external HD connected through a USB port, I also see a significant difference in run speeds (factor of 2-3).

zacharycope0 commented 1 year ago

@sbrambilla. Here are the two time logs. The QF simulation that was run on the server made a lot more function calls.

Lucas thought there could be issues with how the server is using OpenMP. I'm currently rerunning with 1 thread to see how that changes things.

Server Percentage of cells with fuel (fire domain): 99.0776978 % Percentage of cells with fuel (QU domain): 3.96310806 % Max number of Fires in a timestep: 0 Max number of Fires in a timestep (% of fuel cells): 0.00000000 % Simulation time = 4363 s Sub time fire: 682 Sub time plumes: 1873 Sub time mass-consistency: 1664 !========================================== InitNewPlumes : 36 MovePlumes : 1331 [CompPlumePull] : 5 [PullPlumeTog] : 1322 [TerrainPull] : 0 MergePlumes : 278 WPlume2QUGrid : 15 UpdateAlphaDenom: 186 Divergence : 55 SOR_Startup : 2 SOR_InSimu : 1433 EulerUpdate : 179 ConvElemToCanopy: 1 InjPlumes2Wind : 0 InterpQU2FG : 33 UpCanopy : 8 UpFire : 0 InitWorkArray : 13 CompNewFire : 641 UpBurnCenter : 19 UpMoisture : 3 DetActFireCells : 2 ChkProx2Bord : 0 ChkFrontDist : 3 !========================================== Maximum plume height [m]: 32.5953178
Maximum plume w [m/s]: 8.53116512
Maximum domain height [m]: 249.999985
!===== Max Numbers Per Output Period ===== Time: 0-> 60s, Max # Fires: 27216 Max # Plumes Pre-merge: 17061 Max # Plumes Post-merge: 4438

WSL Percentage of cells with fuel (fire domain): 99.0776978 % Percentage of cells with fuel (QU domain): 3.96310806 % Max number of Fires in a timestep: 0 Max number of Fires in a timestep (% of fuel cells): 0.00000000 % Simulation time = 761 s Sub time fire: 169 Sub time plumes: 386 Sub time mass-consistency: 176 !========================================== InitNewPlumes : 3 MovePlumes : 292 [CompPlumePull] : 0 [PullPlumeTog] : 292 [TerrainPull] : 0 MergePlumes : 59 WPlume2QUGrid : 2 UpdateAlphaDenom: 29 Divergence : 9 SOR_Startup : 0 SOR_InSimu : 149 EulerUpdate : 19 ConvElemToCanopy: 0 InjPlumes2Wind : 0 InterpQU2FG : 0 UpCanopy : 0 UpFire : 0 InitWorkArray : 3 CompNewFire : 166 UpBurnCenter : 0 UpMoisture : 0 DetActFireCells : 0 ChkProx2Bord : 0 ChkFrontDist : 0 !========================================== Maximum plume height [m]: 28.6837788
Maximum plume w [m/s]: 8.89114094
Maximum domain height [m]: 249.999985
!===== Max Numbers Per Output Period ===== Time: 0-> 60s, Max # Fires: 27333 Max # Plumes Pre-merge: 18135 Max # Plumes Post-merge: 4609