filecoin-project / lotus

Reference implementation of the Filecoin protocol, written in Go
https://lotus.filecoin.io/
Other
2.83k stars 1.25k forks source link

Every miner/worker process uses 100+ MB of GPU RAM when idle #7845

Closed mtelka closed 2 years ago

mtelka commented 2 years ago

Checklist

Lotus component

Lotus Version

lotus-miner version 1.13.1+calibnet+git.8943de2.dirty
lotus-worker version 1.13.1+calibnet+git.8943de2.dirty

Describe the Bug

Every miner/worker process opens GPU and consumes 100+ MB of its RAM even the GPU is not used and never will be used (for example PC1 only workers). This problem was introduced in v1.13.0 (v1.12.0 worked okay).

Logging Information

# nvidia-smi
Sat Dec 25 06:13:59 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.67       Driver Version: 460.67       CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce RTX 3060    Off  | 00000000:81:00.0 Off |                  N/A |
| 32%   32C    P8    10W / 170W |    538MiB / 12053MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A     19421      C   ...bin/lotus-worker-calibnet      101MiB |
|    0   N/A  N/A     20308      C   ...bin/lotus-worker-calibnet      101MiB |
|    0   N/A  N/A     24365      C   ...bin/lotus-worker-calibnet      125MiB |
|    0   N/A  N/A     26725      C   .../bin/lotus-miner-calibnet      107MiB |
|    0   N/A  N/A     32518      C   ...bin/lotus-worker-calibnet      101MiB |
+-----------------------------------------------------------------------------+
#

Repo Steps

  1. Build lotus with the new cuda support (FFI_USE_CUDA=1).
  2. Start miner and workers.
  3. Look at nvidia-smi output.
mtelka commented 2 years ago

Workaround: build lotus without the new cuda support (without FFI_USE_CUDA=1).

rjan90 commented 2 years ago

Hey @mtelka! Thanks for the report.

I tried to reproduce this issue on newer versions of Lotus, and was not able to reproduce it. So I´m closing this issue for now - if you still see this on newer versions of Lotus, please reopen or create a new ticket!

Thanks again for the report!

mtelka commented 2 years ago

I just tested this again with v1.15.2 (calibnet) and the problem is still there. Unfortunately I do not see any way how to reopen this Issue. @rjan90 please re-open this for me. Thank you.

rjan90 commented 2 years ago

Hi @mtelka!

I'm not able to reproduce this issue locally! Which OS are you running?

mtelka commented 2 years ago

@rjan90 CentOS 7

rjan90 commented 2 years ago

Maybe some of the VRAM usage reported is because of the OS GUI? From a worker (PC1/PC2-worker) that is using Ubuntu Server:

lotus-worker info
Worker version:  1.6.0
CLI version: lotus-worker version 1.15.1-rc4+mainnet+git.6a88a94a8

Session: c27bb52a-0c9b-47aa-bf68-735f45698104
Enabled: true
Hostname: worker
CPUs: 128; GPUs: [NVIDIA RTX A2000]
RAM: 347.4 GiB/995.6 GiB; Swap: 0 B/0 B
Task types: FIN GET FRU UNS C1 PC2 PC1 PR1 AP 

cd1b738b-4c74-4b84-ba6a-c2ae8757d891:
    Weight: 10; Use: Seal 
    Local: /mnt/nvmeraid/worker
Nvidia-smi output
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.54       Driver Version: 510.54       CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA RTX A2000    Off  | 00000000:41:00.0 Off |                  Off |
| 30%   39C    P8     6W /  70W |      2MiB /  6138MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

That said, it seems like you are running quite a lot of workers on the same machine, which seems kind of excessive. I don´t know your setup, but you only need to run one worker for all tasks.

mtelka commented 2 years ago

There is no GUI running on that machine. From my nvidia-smi output above you could see that nothing else is using VRAM, just miner and four workers.

If I start single worker only (and nothing else) - just PC1 worker compiled with cuda support, it opens GPU and allocates 100MB on it. It shouldn't. If I start the same PC1 worker compiled without cuda support (opengl only), it does not permanently allocate VRAM.

The details about my setup are not important here. The example I provided back in December is to show that all workers (AP, PC1, PC2, C1+C2) in addition to the miner are using (leaking?) 100 MB of VRAM all the time, even when idle.

I use this nvcc:

# /usr/local/cuda-11.2/bin/nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Feb_14_21:12:58_PST_2021
Cuda compilation tools, release 11.2, V11.2.152
Build cuda_11.2.r11.2/compiler.29618528_0
#
rjan90 commented 2 years ago

What environmental variables are you setting when spinning up the PC1-worker?

The details about my setup are not important here.

I would say that is highly relevant to find out why there is a discrepancy between what you and I are seeing in our setups though.

mtelka commented 2 years ago

From my systemd service file for PC1 worker:

[Service]
Environment=GOLOG_FILE=XXX
Environment=LOTUS_WORKER_PATH=XXX
Environment=FIL_PROOFS_PARENT_CACHE=XXX
Environment=FIL_PROOFS_USE_MULTICORE_SDR=1
EnvironmentFile=PATHTOENVFILE
User=XXX

The PATHTOENVFILE file contains MINER_API_INFO=XXX only.

mtelka commented 2 years ago

The details about my setup are not important here.

I would say that is highly relevant to find out why there is a discrepancy between what you and I are seeing in our setups though.

I said that as a reply to your: "I don´t know your setup, but you only need to run one worker for all tasks." Sorry, I had to be more exact.