GPUSPH / gpusph

The world's first CUDA implementation of Weakly-Compressible Smoothed Particle Hydrodynamics
156 stars 67 forks source link

WaveTank fails on next and spheric2022 branches #81

Open Candles opened 5 months ago

Candles commented 5 months ago

I've been testing the spheric2022 branch since a lot of comments seem to recommend it. Most of the example problems run fine, but the one exception is WaveTank (which ironically is the one that the researcher I'm working with is interested in). It works properly on the master branch:

GPUSPH version v5.0+1-08ffd94d+custom
Release version without fastmath for compute capability 9.0
Chrono : disabled
HDF5   : disabled
MPI    : enabled
Catalyst : disabled
Compiled for problem "WaveTank"
[Network] rank 0 (1/1), host youmu
 tot devs = 1 (1 * 1)

paddle_amplitude (radians): 0.218669
Info stream: GPUSPH-1097307
Initializing...
Water level not set, autocomputed: 0.465
Max particle speed not set, autocomputed from max fall: 2.97136
WARNING: dt 0.0001 will be used only for the first iteration because adaptive dt is enabled
Expected maximum shear rate: 512.821 1/s
dt = 0.0001 (CFL conditions from soundspeed: 0.00039, from gravity 0.0126104, from viscosity 190.125)
Using computed max neib list size 128
Using computed neib bound pos 127
Artificial viscosity epsilon is not set, using default value: 1.521000e-05
Problem calling set grid params
Influence radius / neighbor search radius / expected cell side  : 0.078 / 0.078 / 0.078
Autocomputed SPS Smagorinsky factor 1.296e-05 from C_s = 0.12, ∆p = 0.03
Autocomputed SPS isotropic factor 3.96e-06 from C_i = 0.0066, ∆p = 0.03
 - World origin: 0 , 0 , 0
 - World size:   9 x 0.6 x 1
 - Cell size:    0.0782609 x 0.0857143 x 0.0833333
 - Grid size:    115 x 7 x 12 (9,660 cells)
 - Cell linearization: y,z,x
 - Dp:   0.03
 - R0:   0.03
Generating problem particles...
VTKWriter will write every 0.1 (simulated) seconds
HotStart checkpoints every 0.1 (simulated) seconds
        will keep the last 8 checkpoints
Allocating shared host buffers...
Numbodies : 1
Numforcesbodies : 0
numOpenBoundaries : 0
  allocated 4.21 MiB on host for 55,124 particles (55,124 active)
Copying the particles to shared arrays...
---
Open boundaries: 0
Fluid: 28842 parts, mass 0.0273792
Boundary: 25370 parts, mass 0.027
Testpoint: 0 parts
Tot: 55124 particles
---
RB First/Last Index:
Preparing the problem...
Body: 0
         Cg grid pos: 2 3 4
         Cg pos: -0.0215778 -1.38778e-17 -0.0333345
 - device at index 0 has 55,124 particles assigned and offset 0
Integrator predictor/corrector instantiated.
Starting workers...
number of forces rigid bodies particles = 0
thread 0xffffb0deb840 device idx 0: CUDA device 0/1, PCI device 0009:01:00.0: GH200 480GB
Device idx 0: free memory 96710 MiB, total memory 97280 MiB
Estimated memory consumption: 400B/particle
Device idx 0 (CUDA: 0) allocated 0 B on host, 20.05 MiB on device
  assigned particles: 55,124; allocated: 55,124
GPUSPH: initialized
Performing first write...
Letting threads upload the subdomains...
Thread 0 uploading 55124 Position items (861.31 KiB) on device 0 from position 0
Thread 0 uploading 55124 Velocity items (861.31 KiB) on device 0 from position 0
Thread 0 uploading 55124 Info items (430.66 KiB) on device 0 from position 0
Thread 0 uploading 55124 Hash items (215.33 KiB) on device 0 from position 0
Entering the main simulation cycle
Simulation time t=0.000000e+00s, iteration=0, dt=1.000000e-04s, 55,124 parts (0, cum. 0 MIPPS), maxneibs 87+0
<elided for brevity>
Simulation time t=1.000013e+01s, iteration=28,386, dt=3.545454e-04s, 55,124 parts (49, cum. 51 MIPPS), maxneibs 107+0
Elapsed time of simulation cycle: 31s
Peak particle speed was ~3.46298 m/s at 0.300096 s -> can set maximum vel 3.8 for this problem
Simulation end, cleaning up...
Deallocating...
Do scripts/rmtests to remove all tests

And on the other branches:

GPUSPH version v5.0+932-dcb9d216+custom
Release version without fastmath, CUDA backend for compute capability 9.0 (cache preference: L1 cache)
        built with nvcc, major version 12
DEM    : symmetrized
Chrono : disabled
HDF5   : disabled
MPI    : enabled
Catalyst : disabled
Compiled for problem "WaveTank"
[Network] rank 0 (1/1), host youmu
 tot devs = 1 (1 * 1)

paddle_amplitude (radians): 0.218669
Deprecated FT_BORDER converted to FT_OUTER_BORDERFT_INNER_BORDER
Deprecated FT_BORDER converted to FT_OUTER_BORDERFT_INNER_BORDER
Deprecated FT_BORDER converted to FT_OUTER_BORDERFT_INNER_BORDER
Deprecated FT_BORDER converted to FT_OUTER_BORDERFT_INNER_BORDER
Deprecated FT_BORDER converted to FT_OUTER_BORDERFT_INNER_BORDER
Info stream: GPUSPH-1095392
Initializing...
Water level not set, autocomputed: 0.45
Speed of sound for fluid 0 auto-computed as c0 = 30.4474
Problem name set to 'WaveTank' automatically
Expected maximum shear rate: 1498.95 1/s
setting dt = 0.000133427 from CFL conditions (soundspeed: 0.000133427, gravity: 0.00910075, viscosity: 51.5747)
Using computed max neib list size 128
Using computed neib bound pos 127
Colagrossi diffusion coefficient: 1.000000e-01 (default value)
Artificial viscosity epsilon is not set, using default value: 4.125976e-06
Using problem dir ./tests/WaveTank_2024-01-28T01h50
Problem calling set grid params
Influence radius / neighbor search radius / expected cell side  : 0.040625 / 0.040625 / 0.040625
Expected max travel distance between neighbors list constructions: 0.0040625 (0.26 ∆p or 0.0997569 cells)
Autocomputed SPS Smagorinsky factor 3.51562e-06 from C_s = 0.12, ∆p = 0.015625
Autocomputed SPS isotropic factor 1.07422e-06 from C_i = 0.0066, ∆p = 0.015625
 - World origin: 0 , 0 , 0
 - World size:   9 x 0.6 x 1
 - Cell size:    0.040724 x 0.0428571 x 0.0416667
 - Grid size:    221 x 14 x 24 (74,256 cells)
 - Cell linearization: y,z,x
 - Dp:   0.015625
 - R0:   0.015625
Generating problem particles...
running fill_parts
Update 0
|                                                           Particle count                                                           |
+------------------+------------------+------------------+------------------+------------------+------------------+------------------+
|            fluid |         boundary |       testpoints |           bodies |             HDF5 |              XYZ |              TeT |
|           237984 |           198621 |                0 |             5106 |                0 |                0 |                0 |
VTKWriter will write every 0.1 (simulated) seconds
HotStart checkpoints every 0.1 (simulated) seconds
        will keep the last 8 checkpoints
Allocating shared host buffers...
Numbodies : 1
Numforcesbodies : 0
Numfeabodies : 0
numOpenBoundaries : 0
  allocated 40.44 MiB on host for 441,712 particles (441,711 active)
Copying the particles to shared arrays...
---
Particle 237984 type 1=B position (0.767888, 0.0157895, -0.0300099, 0.00385278) is outside of the domain (0, 0, 0)--(9, 0.6, 1)
Open boundaries: 0
Fluid: 237984 parts, mass 0.00382813
Boundary: 198621 parts, mass 0.00385278
Testpoint: 0 parts
Tot: 441711 particles
96188 particles were placed out of bounds during init
will check for cell overflow
---
RB First/Last Index:
FEA body parts first Index:
FEA body nodes first Index:
Preparing the problem for 441711 particles...
Body: 0
         Cg grid pos: 3 7 8
         Cg pos: 0.0155433 -0.0214286 -0.00911166
 - device at index 0 has 441,711 particles assigned and offset 0
Integrator predictor/corrector instantiated.
Starting workers...
number of forces rigid bodies particles = 0
number of fea nodes particles = 0
thread 281473587988544 device idx 0: CUDA device 0/1, PCI device 0009:01:00.0: GH200 480GB
Device idx 0: free memory 94.44 GiB: total memory 95 GiB
Estimated memory consumption: 8B/cell
Estimated memory consumption: 460B/particle
Device idx 0 (CUDA: 0) allocated 0 B on host, 185.93 MiB on device
  assigned particles: 441,711; allocated: 441,712
GPUSPH: initialized
Performing first write...
Letting threads upload the subdomains...
Thread 0 uploading 441711 Position items (6.74 MiB) on device 0 from position 0
Thread 0 uploading 441711 Velocity items (6.74 MiB) on device 0 from position 0
Thread 0 uploading 441711 Info items (3.37 MiB) on device 0 from position 0
Thread 0 uploading 441711 Hash items (1.68 MiB) on device 0 from position 0
Thread 0 skipping host buffer Interpolated Dummy boundary velocity for device 0 (invalid buffer)
Entering the main simulation cycle
Simulation time t=0.000000e+00s, iteration=0, dt=1.334268e-04s, 441,711 parts (0, cum. 0 MIPPS), maxneibs 80+0
Simulation time t=1.000822e-01s, iteration=825, dt=1.212970e-04s, 441,711 parts (1.1e+02, cum. 1.1e+02 MIPPS), maxneibs 85+0
Simulation time t=2.000310e-01s, iteration=1,649, dt=1.212970e-04s, 441,711 parts (1.1e+02, cum. 1.1e+02 MIPPS), maxneibs 85+0
Simulation time t=3.001010e-01s, iteration=2,474, dt=1.212970e-04s, 441,711 parts (1.1e+02, cum. 1.1e+02 MIPPS), maxneibs 85+0
ERROR: cell 1500 [grid position (4, 2, 11), global position (0.183258, 0.107143, 0.479167)] on device 0 has too many particles (8782 > 2047)
Possible reasons:
        inadequate world size (96188 particles were marked as out-of-bounds during init)
        fluid column collapse
overfull cell
Elapsed time of simulation cycle: 12.47s [sim time: 0.3942, ratio 31.64]
Peak particle speed was ~0.670645 m/s at 0.100689 s -> can set maximum vel 0.74 for this problem
Simulation end, cleaning up...
Deallocating...

I get the same outcome on separate Rocky 9 and Ubuntu 22.04 systems running aarch64 and x86 respectively, using either CUDA 11.8 or 12.3 and the system install of gcc (11.4 for both).

Oblomov commented 4 months ago

Hello and sorry for the late reply. The issue stems from the fact that WaveTank test case shipped since v5 still sets the world origin and size manually (around line 84, there's an explicit setting of m_size and m_origin). Removing those should fix it. We're in the process of cleaning up all the test case for the v6 release, and this (along with the proper usage of FT_INNER_BORDER and FT_OUTER_BORDER instead of FT_BORDER) will be part of the cleanup.

Could you please check if removing the explicit setting of the world size and origin fixes the issue for you as well?