MFlowCode / MFC

Exascale simulation of multiphase/physics fluid dynamics
https://mflowcode.github.io
MIT License
137 stars 60 forks source link

Sanity check on Device <--> Host memory transfers on GPU simulations #329

Closed sbryngelson closed 6 months ago

sbryngelson commented 6 months ago

A recent profile of @anshgupta1234's branch revealed several memory transfers between host and device within a time step. They are short in the grand scheme of things because the case is small, but I could imagine this being problematic otherwise. I am posting some nsight sys screenshots here that should be investigated. Some of them appear to be moving data but are not associated with Ansh's IBM implementation (seemingly).

Screenshot 2024-02-05 at 10 38 58 Screenshot 2024-02-05 at 10 39 39 Screenshot 2024-02-05 at 10 41 08 Screenshot 2024-02-05 at 10 43 47

sbryngelson commented 6 months ago

@anshgupta1234 can you clarify if this profile was run with or without the !$acc parallel loop in the IBM region of the code that we discussed?

anshgupta1234 commented 6 months ago

@sbryngelson it was without the !$acc parallel loop

sbryngelson commented 6 months ago

Can you add the acc loop back in, run it with 1M grid cells in 2D for 10 time steps, then send me the nsys profile again? @anshgupta1234

sbryngelson commented 6 months ago

Closing this, the compute sections look fine in the new profile. Any stray uploads are just local variables, they only appear important because the IBM update takes so little time but in practice they do not matter.