Closed mtelka closed 2 years ago
Workaround: build lotus without the new cuda support (without FFI_USE_CUDA=1).
Hey @mtelka! Thanks for the report.
I tried to reproduce this issue on newer versions of Lotus, and was not able to reproduce it. So I´m closing this issue for now - if you still see this on newer versions of Lotus, please reopen or create a new ticket!
Thanks again for the report!
I just tested this again with v1.15.2 (calibnet) and the problem is still there. Unfortunately I do not see any way how to reopen this Issue. @rjan90 please re-open this for me. Thank you.
Hi @mtelka!
I'm not able to reproduce this issue locally! Which OS are you running?
@rjan90 CentOS 7
Maybe some of the VRAM usage reported is because of the OS GUI? From a worker (PC1/PC2-worker) that is using Ubuntu Server:
lotus-worker info
Worker version: 1.6.0
CLI version: lotus-worker version 1.15.1-rc4+mainnet+git.6a88a94a8
Session: c27bb52a-0c9b-47aa-bf68-735f45698104
Enabled: true
Hostname: worker
CPUs: 128; GPUs: [NVIDIA RTX A2000]
RAM: 347.4 GiB/995.6 GiB; Swap: 0 B/0 B
Task types: FIN GET FRU UNS C1 PC2 PC1 PR1 AP
cd1b738b-4c74-4b84-ba6a-c2ae8757d891:
Weight: 10; Use: Seal
Local: /mnt/nvmeraid/worker
Nvidia-smi output
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.54 Driver Version: 510.54 CUDA Version: 11.6 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA RTX A2000 Off | 00000000:41:00.0 Off | Off |
| 30% 39C P8 6W / 70W | 2MiB / 6138MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
That said, it seems like you are running quite a lot of workers on the same machine, which seems kind of excessive. I don´t know your setup, but you only need to run one worker for all tasks.
There is no GUI running on that machine. From my nvidia-smi output above you could see that nothing else is using VRAM, just miner and four workers.
If I start single worker only (and nothing else) - just PC1 worker compiled with cuda support, it opens GPU and allocates 100MB on it. It shouldn't. If I start the same PC1 worker compiled without cuda support (opengl only), it does not permanently allocate VRAM.
The details about my setup are not important here. The example I provided back in December is to show that all workers (AP, PC1, PC2, C1+C2) in addition to the miner are using (leaking?) 100 MB of VRAM all the time, even when idle.
I use this nvcc:
# /usr/local/cuda-11.2/bin/nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Feb_14_21:12:58_PST_2021
Cuda compilation tools, release 11.2, V11.2.152
Build cuda_11.2.r11.2/compiler.29618528_0
#
What environmental variables are you setting when spinning up the PC1-worker?
The details about my setup are not important here.
I would say that is highly relevant to find out why there is a discrepancy between what you and I are seeing in our setups though.
From my systemd service file for PC1 worker:
[Service]
Environment=GOLOG_FILE=XXX
Environment=LOTUS_WORKER_PATH=XXX
Environment=FIL_PROOFS_PARENT_CACHE=XXX
Environment=FIL_PROOFS_USE_MULTICORE_SDR=1
EnvironmentFile=PATHTOENVFILE
User=XXX
The PATHTOENVFILE
file contains MINER_API_INFO=XXX
only.
The details about my setup are not important here.
I would say that is highly relevant to find out why there is a discrepancy between what you and I are seeing in our setups though.
I said that as a reply to your: "I don´t know your setup, but you only need to run one worker for all tasks." Sorry, I had to be more exact.
Checklist
Latest release
, or the most recent RC(release canadiate) for the upcoming release or the dev branch(master), or have an issue updating to any of these.Lotus component
Lotus Version
Describe the Bug
Every miner/worker process opens GPU and consumes 100+ MB of its RAM even the GPU is not used and never will be used (for example PC1 only workers). This problem was introduced in v1.13.0 (v1.12.0 worked okay).
Logging Information
Repo Steps