Closed jeremylt closed 1 month ago
Blocks #1646
Branch passes Ratel CI too
$ make prove -j CEED_BACKENDS=/gpu/cuda/shared
-----------------------------------------
| ____ __ __ |
| / __ \ ____ _ / /_ ___ / / |
| / /_/ / / __ `/ / __/ / _ \ / / |
| / _, _/ / /_/ / / /_ / __/ / / |
| /_/ |_| \__,_/ \__/ \___/ /_/ |
-----------------------------------------
-----------------------------------------
Dependencies:
CEED_DIR = /home/jeremy/Dev/libCEED
PETSC_DIR = /home/jeremy/Dev/petsc
PETSC_ARCH = arch-cuda-mpich
Optional Dependencies:
ENZYME_LIB = (not found)
-----------------------------------------
Running unit tests
- Testing with libCEED backends: /gpu/cuda/shared
- Testing on 1 processes
prove -j 16 --exec 'python3 tests/junit.py --petsc-arch arch-cuda-mpich --ceed-backends /gpu/cuda/shared --mode tap --nproc 1 --pool-size 1' t000-init t001-view t002-view t003-ts-monitor t004-ts-checkpoint t010-eigensolver t050-mpm t100-static-elasticity t101-static-elasticity t102-static-elasticity t103-static-elasticity t110-static-elasticity t111-static-elasticity t120-static-elasticity t121-static-elasticity t122-static-elasticity t123-static-elasticity t211-quasistatic-elasticity t221-quasistatic-elasticity t222-quasistatic-elasticity ex01-static ex02-quasistatic ex03-dynamic
t050-mpm ..................... ok
t010-eigensolver ............. ok
t000-init .................... ok
t001-view .................... ok
t002-view .................... ok
t003-ts-monitor .............. ok
t101-static-elasticity ....... ok
t122-static-elasticity ....... ok
t123-static-elasticity ....... ok
t100-static-elasticity ....... ok
t111-static-elasticity ....... ok
t211-quasistatic-elasticity .. ok
t103-static-elasticity ....... ok
t222-quasistatic-elasticity .. ok
t110-static-elasticity ....... ok
t120-static-elasticity ....... ok
t102-static-elasticity ....... ok
t221-quasistatic-elasticity .. ok
t121-static-elasticity ....... ok
ex03-dynamic ................. ok
t004-ts-checkpoint ........... ok
ex01-static .................. ok
ex02-quasistatic ............. ok
All tests successful.
Files=23, Tests=197, 482 wallclock secs ( 0.12 usr 0.04 sys + 1145.66 cusr 120.38 csys = 1266.20 CPU)
Result: PASS
Note to me for follow-up: Make an issue about adding this code cleanup and simplification to the CPU side of the house
This takes lessons from the Gen backends and processes the fields in the best order to prevent duplicate work while allowing for reuse of buffers, specifically the new cached vector buffer.