awslabs / fast-differential-privacy

Fast, memory-efficient, scalable optimization of deep learning with differential privacy
Apache License 2.0
99 stars 19 forks source link

Got Error: Multi-gpu and distributed training is currently not supported #32

Closed giandos200 closed 4 months ago

giandos200 commented 5 months ago

Hi,

I'm trying to reproduce the first text classification examples but I'm encountering the same Multi-GPU error even if I'm using two A100 80gb GPU with 10 CPU cores and 50gb RAM. Could you please help me resolve this?

{'0':'terrible','1':'great'}
*cls**sent_0*_It_was*mask*.*sep+*
Traceback (most recent call last):
  File "/user/.pyenv/versions/3.9.18/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/user/.pyenv/versions/3.9.18/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/user/fast-differential-privacy/examples/text_classification/run_classification.py", line 935, in <module>
    main()
  File "/user/fast-differential-privacy/examples/text_classification/run_classification.py", line 792, in main
    trainer.train(model_path=None)
  File "/user/fast-differential-privacy/examples/text_classification/src/trainer.py", line 253, in train
    raise ValueError("Multi-gpu and distributed training is currently not supported.")
ValueError: Multi-gpu and distributed training is currently not supported.

Both GPU are visualised by torch :

>>> print(torch.cuda.is_available())
True
>>> torch.cuda.device_count()
2
>>> torch.cuda.current_device()
0
>>> torch.cuda.get_device_name(0)
'NVIDIA A100-SXM4-80GB'
>>> torch.cuda.get_device_name(1)
'NVIDIA A100-SXM4-80GB'

Python -V 3.9.18 torch==1.11.0+cu113 transformers==4.20.1 deepspeed==0.8.3

full list of package behind:

Package                             Version        
----------------------------------- 
aiohttp                             3.9.5
aiosignal                           1.3.1
anyio                               4.3.0
argcomplete                         1.12.1
argon2-cffi                         23.1.0
argon2-cffi-bindings                21.2.0
arrow                               1.3.0
asttokens                           2.4.1
async-lru                           2.0.4
async-timeout                       4.0.3
attrs                               23.2.0
avro-python3                        1.9.2.1
azure-core                          1.22.1
azure-storage-blob                  12.4.0
Babel                               2.15.0
beautifulsoup4                      4.12.3
bleach                              6.1.0
bottle                              0.12.20
certifi                             2023.7.22
cffi                                1.16.0
chardet                             3.0.4
charset-normalizer                  2.0.4
click                               8.0.1
comm                                0.2.2
contourpy                           1.2.1
crcmod                              1.7
cryptography                        42.0.7
cycler                              0.10.0
datasets                            2.19.1
debugpy                             1.8.1
decorator                           5.1.1
deepspeed                           0.8.3
defusedxml                          0.7.1
diffimg                             0.2.3
dill                                0.3.8
distro                              1.9.0
docker-pycreds                      0.4.0
docopt                              0.6.2
exceptiongroup                      1.2.1
executing                           2.0.1
fairscale                           0.4.0
fastavro                            1.4.1
fastDP                              2.0.0          ..
fastjsonschema                      2.19.1
filelock                            3.0.12
fire                                0.6.0
fonttools                           4.51.0
fqdn                                1.5.1
frozenlist                          1.4.1
fsspec                              2024.3.1
fusepy                              2.0.4
future                              0.18.3
gdown                               5.2.0
gitdb                               4.0.11
GitPython                           3.1.43
GPUtil                              1.4.0
gpytorch                            1.11
greenlet                            3.0.3
h11                                 0.14.0
hjson                               3.1.0
httpcore                            1.0.5
httplib2                            0.19.0
httpx                               0.27.0
huggingface-hub                     0.23.0
idna                                3.2
imageio                             2.9.0
importlib_metadata                  7.1.0
importlib_resources                 6.4.0
indexed-gzip-fileobj-fork-epicfaace 1.5.4
iniconfig                           2.0.0
ipykernel                           6.29.4
ipython                             8.18.1
ipywidgets                          8.1.2
isodate                             0.6.0
isoduration                         20.11.0
jaxtyping                           0.2.28
jedi                                0.19.1
Jinja2                              3.1.4
joblib                              1.2.0
json5                               0.9.25
jsonpointer                         2.4
jsonschema                          4.22.0
jsonschema-specifications           2023.12.1
jupyter                             1.0.0
jupyter_client                      8.6.1
jupyter-console                     6.6.3
jupyter_core                        5.7.2
jupyter-events                      0.10.0
jupyter-lsp                         2.2.5
jupyter_server                      2.14.0
jupyter_server_terminals            0.5.3
jupyterlab                          4.2.0
jupyterlab_pygments                 0.3.0
jupyterlab_server                   2.27.1
jupyterlab_widgets                  3.0.10
kiwisolver                          1.3.1
lazy_loader                         0.3
linear-operator                     0.5.2
markdown2                           2.4.0
MarkupSafe                          2.1.5
marshmallow                         2.15.1
marshmallow-jsonapi                 0.15.1
matplotlib                          3.4.3
matplotlib-inline                   0.1.7
mistune                             3.0.2
ml-swissknife                       0.1.30
mock                                2.0.0
mpmath                              1.3.0
msrest                              0.6.21
multidict                           6.0.5
multiprocess                        0.70.16
nbclient                            0.10.0
nbconvert                           7.16.4
nbformat                            5.10.4
nest-asyncio                        1.6.0
networkx                            2.6.2
ninja                               1.11.1.1
nltk                                3.6.6
notebook                            7.2.0
notebook_shim                       0.2.4
numpy                               1.26.4
nvidia-cublas-cu12                  12.1.3.1
nvidia-cuda-cupti-cu12              12.1.105
nvidia-cuda-nvrtc-cu12              12.1.105
nvidia-cuda-runtime-cu12            12.1.105
nvidia-cudnn-cu12                   8.9.2.26
nvidia-cufft-cu12                   11.0.2.54
nvidia-curand-cu12                  10.3.2.106
nvidia-cusolver-cu12                11.4.5.107
nvidia-cusparse-cu12                12.1.0.106
nvidia-nccl-cu12                    2.20.5
nvidia-nvjitlink-cu12               12.4.127
nvidia-nvtx-cu12                    12.1.105
oauth2client                        4.1.3
oauthlib                            3.2.2
opacus                              1.0.0
openai                              1.30.1
opt-einsum                          3.3.0
overrides                           7.7.0
packaging                           21.0
pandas                              2.0.0
pandocfilters                       1.5.1
parso                               0.8.4
pathtools                           0.1.2
pbr                                 5.6.0
pexpect                             4.9.0
pillow                              10.2.0
pip                                 24.0
platformdirs                        4.2.2
pluggy                              1.5.0
prometheus_client                   0.20.0
prompt-toolkit                      3.0.43
protobuf                            4.25.3
prv-accountant                      0.2.0
psutil                              5.7.2
ptyprocess                          0.7.0
pure-eval                           0.2.2
py                                  1.11.0
py-cpuinfo                          9.0.0
pyarrow                             16.1.0
pyarrow-hotfix                      0.6
pyasn1                              0.4.8
pyasn1-modules                      0.2.8
pycparser                           2.20
pydantic                            1.10.0
pydot                               1.4.2
Pygments                            2.18.0
pymongo                             3.11.4
pyparsing                           2.4.7
PySocks                             1.7.1
pytest                              8.2.0
python-dateutil                     2.8.2
python-json-logger                  2.0.7
pytz                                2021.1
PyWavelets                          1.1.1
PyYAML                              5.4.1
pyzmq                               26.0.3
qtconsole                           5.5.2
QtPy                                2.4.1
referencing                         0.35.1
regex                               2021.8.3
requests                            2.31.0
requests-oauthlib                   2.0.0
retry                               0.9.2
rfc3339-validator                   0.1.4
rfc3986-validator                   0.1.1
rpds-py                             0.18.1
rsa                                 4.9
sacremoses                          0.0.45
safetensors                         0.4.3
scikit-image                        0.18.2
scikit-learn                        1.0.1
scipy                               1.13.0
seaborn                             0.11.2
selenium                            3.141.0
Send2Trash                          1.8.3
sentence-transformers               2.2.2
sentencepiece                       0.1.96
sentry-sdk                          1.14.0
setproctitle                        1.3.3
setuptools                          58.1.0
six                                 1.15.0
smmap                               5.0.1
sniffio                             1.3.1
soupsieve                           2.5
SQLAlchemy                          1.3.19
stack-data                          0.6.3
sympy                               1.12
termcolor                           1.1.0
terminado                           0.18.1
threadpoolctl                       2.2.0
tifffile                            2021.8.8
tinycss2                            1.3.0
tokenizers                          0.12.1
tomli                               2.0.1
torch                               1.11.0+cu113
torchaudio                          0.11.0+cu113
torchvision                         0.12.0+cu113
tornado                             6.4
tqdm                                4.66.4
traitlets                           5.14.3
transformers                        4.20.1
triton                              2.3.0
typeguard                           2.13.3
types-python-dateutil               2.9.0.20240316
typing_extensions                   4.4.0
tzdata                              2024.1
uri-template                        1.3.0
urllib3                             1.26.18
wandb                               0.17.0
watchdog                            0.10.3
wcwidth                             0.2.13
webcolors                           1.13
webencodings                        0.5.1
websocket-client                    1.0.1
widgetsnbextension                  4.0.10
xxhash                              3.4.1
yarl                                1.9.4
zipp                                3.18.2
ShayanShamsi commented 4 months ago

Hi. Can you please tell me how you resolved this? I am running as follows but still getting the same error.

CUDA_VISIBLE_DEVICES=0 python -m text_classification.run_wrapper --output_dir ToDeleteNLU --task_name sst-2 --model_name_or_path distilbert-base-uncased