awslabs / fast-differential-privacy

Fast, memory-efficient, scalable optimization of deep learning with differential privacy
Apache License 2.0
83 stars 11 forks source link

Got Error: Multi-gpu and distributed training is currently not supported #32

Closed giandos200 closed 2 months ago

giandos200 commented 3 months ago


I'm trying to reproduce the first text classification examples but I'm encountering the same Multi-GPU error even if I'm using two A100 80gb GPU with 10 CPU cores and 50gb RAM. Could you please help me resolve this?

Traceback (most recent call last):
  File "/user/.pyenv/versions/3.9.18/lib/python3.9/", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/user/.pyenv/versions/3.9.18/lib/python3.9/", line 87, in _run_code
    exec(code, run_globals)
  File "/user/fast-differential-privacy/examples/text_classification/", line 935, in <module>
  File "/user/fast-differential-privacy/examples/text_classification/", line 792, in main
  File "/user/fast-differential-privacy/examples/text_classification/src/", line 253, in train
    raise ValueError("Multi-gpu and distributed training is currently not supported.")
ValueError: Multi-gpu and distributed training is currently not supported.

Both GPU are visualised by torch :

>>> print(torch.cuda.is_available())
>>> torch.cuda.device_count()
>>> torch.cuda.current_device()
>>> torch.cuda.get_device_name(0)
>>> torch.cuda.get_device_name(1)

Python -V 3.9.18 torch==1.11.0+cu113 transformers==4.20.1 deepspeed==0.8.3

full list of package behind:

ShayanShamsi commented 2 months ago

Hi. Can you please tell me how you resolved this? I am running as follows but still getting the same error.

CUDA_VISIBLE_DEVICES=0 python -m text_classification.run_wrapper --output_dir ToDeleteNLU --task_name sst-2 --model_name_or_path distilbert-base-uncased