Open payoubi opened 5 years ago
hi,
could you attach the output_solver.txt file for this issue?
there might be multiple mpi processes running on the same card or some other visualization applications occupy memory if you use your quadro card also for your display. you check that with the nvidia-smi command.
best, daniel
On Aug 23, 2019, at 08:41, payoubi notifications@github.com wrote:
Hi,
I've successfully compiled specfem3d with cuda. When I run small models, it works fine, but for bigger models (1 million elements for example), it returns CUDA error !!!!!
!!!!! at CUDA call error code: # 1001 while the gpu information says
rank 0: GPU memory usage: used = 1086.875000 MB, free = 906.875000 MB, total = 1993.750000 MB
The gpu version is "Quadro K620".
Do you have any idea what might cause the problem?
``
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.
This is the usage of the graphic card. It's being used by others but it's not that much, the summation would not be the capacity of graphic card,
`Fri Aug 23 09:56:55 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.00 Driver Version: 418.87.00 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Quadro K620 On | 00000000:03:00.0 On | N/A |
| 37% 51C P0 2W / 30W | 681MiB / 1993MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 16355 G /usr/lib/xorg/Xorg 20MiB | | 0 16588 G /usr/lib/xorg/Xorg 230MiB | | 0 17489 G /usr/bin/gnome-shell 299MiB | | 0 17747 G ...quest-channel-token=8468225398656622040 125MiB | +-----------------------------------------------------------------------------+ `
I've also attached the solver output. output_solver.txt
And the error message is: error_message_000000.txt
Hi payoubi,
It might be useful if you run the following command while you are running specfem3d: $> watch -d -n 0.1 nvidia-smi
This will run nvidia-smi every .1 of second and highlight the changes. This will hopefully provide some insight in to how much memory is free and being allocated while running specfem3d.
I think it is also possible that you may get a memory error if there is enough total free memory but not enough contiguous memory (memory fragmentation).
-Thomas
On Fri, Aug 23, 2019 at 7:05 PM payoubi notifications@github.com wrote:
And the error message is: error_message_000000.txt https://github.com/geodynamics/specfem3d/files/3535483/error_message_000000.txt
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/geodynamics/specfem3d/issues/1344?email_source=notifications&email_token=ABYTWY4HYOC5KM4PDH7EDQ3QGAKE7A5CNFSM4IO4GI5KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5AZCNA#issuecomment-524390708, or mute the thread https://github.com/notifications/unsubscribe-auth/ABYTWY2GS3HFXK7XQKLFUVDQGAKE7ANCNFSM4IO4GI5A .
Hi,
I have the same problem. I am ussing a NVIDIA GTX 1060 - 6 Gb
there is not much one can do other than splitting up a bigger simulation onto multiple GPUs. check your output_solver.txt for the lines (like the one provided above):
preparing fields and constants on GPU devices
minimum memory requested : 11886.109104156494 MB per process
this is an estimation of memory needed on your GPU (per MPI process). the rest is up to you how to setup your simulation, i.e., how many MPI processes you want on how many GPUs etc.
thus, the GPU memory pretty much determines the resolution limit of your simulations. but if you want to/must go with a specific resolution beyond that, you need to make the physics cheaper: turn off attenuation, use acoustic rather than elastic and you'll need less GPU memory...
Yes, I had to reduce only the number of spectral elements (176 to 96) and it was possible to run it with CPU, because the 6 Gb of GPU memory was not enough.
CUDA error !!!!!
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.154.05 Driver Version: 535.154.05 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 4060 On | 00000000:01:00.0 On | N/A |
| 0% 39C P5 N/A / 115W | 736MiB / 8188MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
hi, I have the same problem.
you're running out of memory on your GPU card. that is, the simulation is too big to fit onto your GPU.
Hi,
I've successfully compiled specfem3d with cuda. When I run small models, it works fine, but for bigger models (1 million elements for example), it returns
CUDA error !!!!! <out of memory> !!!!! at CUDA call error code: # 1001
while the gpu information says
rank 0: GPU memory usage: used = 1086.875000 MB, free = 906.875000 MB, total = 1993.750000 MB
The gpu version is "Quadro K620".
Do you have any idea what might cause the problem?
``