catalyst-team / catalyst

Accelerated deep learning R&D
https://catalyst-team.com
Apache License 2.0
3.28k stars 386 forks source link

Importing DistributedSamplerWrapper will invalidate the setting CUDA_VISIBLE_DEVICE. #1451

Open zezhishao opened 6 months ago

zezhishao commented 6 months ago

šŸ› Bug Report

After from catalyst.data.sampler import DistributedSamplerWrapper, setting CUDA_VISIBLE_DEVICE will have no effect. To me, this is a bit counterintuitive. Is this correct, I want to know what is the reason and how to fix it?

How To Reproduce

import os
import torch
from catalyst.data.sampler import DistributedSamplerWrapper

os.environ["CUDA_VISIBLE_DEVICES"] = "0"
device_num = torch.cuda.device_count()
print(device_num) # Ouput: 2
import os
import torch
# from catalyst.data.sampler import DistributedSamplerWrapper

os.environ["CUDA_VISIBLE_DEVICES"] = "0"
device_num = torch.cuda.device_count()
print(device_num) # Ouput: 1

Environment

Catalyst version: 22.04
PyTorch version: 2.2.1+cu118
Is debug build: No
CUDA used to build PyTorch: 11.8
TensorFlow version: N/A
TensorBoard version: 2.16.2

OS: Ubuntu 22.04.2 LTS
GCC version: (Ubuntu 11.3.0-1ubuntu1~22.04.1) 11.3.0
CMake version: Could not collect

Python version: 3.9
Is CUDA available: Yes
CUDA runtime version: Could not collect
GPU models and configuration: 
GPU 0: Tesla V100-PCIE-32GB
GPU 1: Tesla V100-PCIE-32GB

Nvidia driver version: 525.125.06
cuDNN version: Could not collect

Versions of relevant libraries:
[pip3] catalyst==22.4
[pip3] easy-torch==1.3.2
[pip3] numpy==1.22.4
[pip3] pytorch-triton==3.0.0+a9bc1a3647
[pip3] tensorboard==2.16.2
[pip3] tensorboard-data-server==0.7.2
[pip3] tensorboardX==2.6.2.2
[pip3] torch==2.2.1+cu118
[pip3] torchaudio==2.2.1+cu118
[pip3] torchvision==0.17.1+cu118
[conda] catalyst                  22.4                     pypi_0    pypi
[conda] easy-torch                1.3.2                    pypi_0    pypi
[conda] numpy                     1.22.4                   pypi_0    pypi
[conda] pytorch-triton            3.0.0+a9bc1a3647          pypi_0    pypi
[conda] tensorboard               2.16.2                   pypi_0    pypi
[conda] tensorboard-data-server   0.7.2                    pypi_0    pypi
[conda] tensorboardx              2.6.2.2                  pypi_0    pypi
[conda] torch                     2.2.1+cu118              pypi_0    pypi
[conda] torchaudio                2.2.1+cu118              pypi_0    pypi
[conda] torchvision               0.17.1+cu118             pypi_0    pypi

Checklist

FAQ

Please review the FAQ before submitting an issue:

github-actions[bot] commented 6 months ago

Hi! Thank you for your contribution! Please re-check all issue template checklists - unfilled issues would be closed automatically. And do not forget to join our slack for collaboration.