CEED / libCEED

CEED Library: Code for Efficient Extensible Discretizations
https://libceed.org
BSD 2-Clause "Simplified" License
203 stars 47 forks source link

GPU Operators use work vectors #1673

Closed jeremylt closed 1 month ago

jeremylt commented 1 month ago

This takes lessons from the Gen backends and processes the fields in the best order to prevent duplicate work while allowing for reuse of buffers, specifically the new cached vector buffer.

jeremylt commented 1 month ago

Blocks #1646

jeremylt commented 1 month ago

Branch passes Ratel CI too

$ make prove -j CEED_BACKENDS=/gpu/cuda/shared
-----------------------------------------
|      ____            __           __  |
|     / __ \  ____ _  / /_  ___    / /  |
|    / /_/ / / __ `/ / __/ / _ \  / /   |
|   / _, _/ / /_/ / / /_  /  __/ / /    |
|  /_/ |_|  \__,_/  \__/  \___/ /_/     |
-----------------------------------------

-----------------------------------------

Dependencies:
CEED_DIR      = /home/jeremy/Dev/libCEED
PETSC_DIR     = /home/jeremy/Dev/petsc
PETSC_ARCH    = arch-cuda-mpich

Optional Dependencies:
ENZYME_LIB     = (not found)

-----------------------------------------

Running unit tests
- Testing with libCEED backends: /gpu/cuda/shared
- Testing on 1 processes
prove -j 16 --exec 'python3 tests/junit.py --petsc-arch arch-cuda-mpich --ceed-backends /gpu/cuda/shared --mode tap --nproc 1 --pool-size 1' t000-init t001-view t002-view t003-ts-monitor t004-ts-checkpoint t010-eigensolver t050-mpm t100-static-elasticity t101-static-elasticity t102-static-elasticity t103-static-elasticity t110-static-elasticity t111-static-elasticity t120-static-elasticity t121-static-elasticity t122-static-elasticity t123-static-elasticity t211-quasistatic-elasticity t221-quasistatic-elasticity t222-quasistatic-elasticity ex01-static ex02-quasistatic ex03-dynamic
t050-mpm ..................... ok                                       
t010-eigensolver ............. ok                                       
t000-init .................... ok                                       
t001-view .................... ok                                       
t002-view .................... ok                                       
t003-ts-monitor .............. ok                                       
t101-static-elasticity ....... ok                                       
t122-static-elasticity ....... ok                                       
t123-static-elasticity ....... ok                                       
t100-static-elasticity ....... ok                                       
t111-static-elasticity ....... ok                                       
t211-quasistatic-elasticity .. ok                                       
t103-static-elasticity ....... ok                                       
t222-quasistatic-elasticity .. ok                                       
t110-static-elasticity ....... ok                                       
t120-static-elasticity ....... ok                                       
t102-static-elasticity ....... ok                                       
t221-quasistatic-elasticity .. ok                                       
t121-static-elasticity ....... ok                                       
ex03-dynamic ................. ok                                       
t004-ts-checkpoint ........... ok                                       
ex01-static .................. ok                                       
ex02-quasistatic ............. ok     
All tests successful.
Files=23, Tests=197, 482 wallclock secs ( 0.12 usr  0.04 sys + 1145.66 cusr 120.38 csys = 1266.20 CPU)
Result: PASS
jeremylt commented 1 month ago

Note to me for follow-up: Make an issue about adding this code cleanup and simplification to the CPU side of the house