Operational efficiency problem

yilou1 commented 2 months ago

Hello!

I didn't save much time by using parallel processes in ANUGA, with 32 cores, I changed multiple grid counts, and tried to meet the CFL, but in my custom dam break simulation, the run time was still quite long.

I want to ask if GPU acceleration is possible, and if the grid creation method can be done in a more flexible way, such as using SMS to create multiple unstructured grids of different sizes.

samcom12 commented 1 month ago

Hi @yilou1 ,

The partial GPU implementation is in the develop_hackathonbranch.

How did you set up the code to run on more nodes for scaling?

You can try parallel examples at here

Hope this will help you.

yilou1 commented 1 month ago

@samcom12 Hello， I have realized parallelism according to the user manual, generated 32 "SWW" result files during operation, and finally merged them. However, during operation, I found that the calculation time of each step exceeded the set time step, and the whole program crashed due to non-convergence. I tried to adjust the grid size and time step, and followed CFL, but I still faced the above problems in the process.

stoiver commented 1 month ago

@yilou1 could you provide a simple example script that demonstrates your problem.

yilou1 commented 1 month ago

flood.zip @stoiver

Hello, I use this code for testing dam failure, ready to connect my dam failure program, input dam failure flow to simulate downstream flooding situation, the current test flow is very large. During running, the program crashes due to non-convergence around 40000s.

In addition, is there any case or test to help me distribute to GPU calculation? In order to better display terrain, I adopted DEM with higher accuracy. In small watershed simulation, the number of grids reached tens of millions, so the time spent in drawing grids and parallel calculation far exceeded the simulation time by many times.

In terms of drawing the grid I am trying to draw the grid using SMS.

stoiver commented 1 month ago

@yilou1 your script looks fine. Unfortunately some data files were not provided in the zip file (though this make sense if the data files are very large).

But it is interesting that the problem arises around t = 40000, which sort of correlates with the creation of the 4th inlet_operator. Is it possible that the 4th Inlet_operator region is close to the transmissive boundary? The error messages from the non-convergence should at least identify the location of the problem.

Nearly always non convergence occurs with transmissive boundaries. What happens is that through some coincidence there becomes a region on the boundary where there is an inflow, which can then amplify. Often this occurs where the slope of the land creates the inflow.

It might help to refine the west boundary into a very specific part of the boundary where you expect outflow, and the rest of the boundary as reflective.

stoiver commented 1 month ago

@yilou1 Also just noticed that you actually create new Inlet_operators in the evolve loop. It would make more sense to just change the rate of the original Inlet_operators via a command like

if i == 42000:
        fixed_inflow4.setQ(1500)

yilou1 commented 1 month ago

@stoiver Thanks for your help! Now it can run smoothly after changing according to your suggestions.

stoiver commented 1 month ago

@yilou1 good to hear. Do you have an idea of what in particular was causing the problem?

yilou1 commented 1 month ago

@stoiver Hello. In my result file, I found that the fluid was in the outflow state at about 40000 seconds, so I guessed that it was the downstream boundary problem caused by the non-convergence. The change in traffic entry did not affect the calculation. In addition, the distance between my encrypted grid at the downstream boundary is very close and the size gap between the two grids is large, which may result in the internal Angle of grid division at the boundary is too large, and no transition grid is formed, resulting in a grid with poor quality, and the superimposed permeable boundary will cause the program to crash. These are premature judgments that haven't been tested much.

anuga-community / anuga_core

Operational efficiency problem #56