Closed jagoosw closed 1 year ago
When I exit the REPL I get a very long error message ending:
Trying to make an MWE I can't reproduce the error without all of my code running so perhaps its not actually in the pressure solver even though that's where the error is being raised.
So in this I've got a load of update_tendencies!
being called, and adding synchronize(device(architecture(model)))
at the end appears to have fixed this.
To summarise:
CUDA error: an illegal memory access was encountered (code 700, ERROR_ILLEGAL_ADDRESS)
errorsynchronize(device(architecture(model)))
Do you know why the manual synchronize
is needed?
No, I'll try making an MWE.
Are all GPU operations KernelAbstractions? Or do you have other stuff sprinkled in?
All KernelAbstractions
I found a similar problem (see #3320), but I am not sure whether it is related or not.
I do not know whether synchronize(device(architecture(model)))
will solve my problem.
Hi all,
I'm stuck trying to debug an error I keep getting when running a non-hydrostatic model on GPU.
It runs for a bit and then throws this error:
I can't get the whole error message because its longer than the screen length but this seems to be the relevant bit when using InteractiveErrors.
If I make the grid smaller it gets more iterations done before it errors but is nowhere near using all of the GPUs memory (A100 with 80GB and model is about 2GB when 256x256x64).
This is with the latest version of Oceananigans (87.4). I'll try to make an MWE.