Open Abdullah955 opened 2 years ago
Sorry, I know nothing.
If you have nothing to do with it, would you mind changing distributed_training: ddp_backend: legacy_ddp
to something else?
If it happens to help you please also share this finding here.
Sincere regards.
i have 3* A100 40G GPU and i'm trying to train wav2vec2 the pretraining model the GPU memory is consumed and the utilization is really low
i've tried to increase the max-token till i get out of memory error with no luck. and num_workers & normalize have no effect.
the problem is that using 1GPU is much higher than using 3
config file:
command used
fairseq-hydra-train task.data=/path/to/data common.tensorboard_logdir=tslog/ --config-dir examples/wav2vec/config/pretraining --config-name wav2vec2_base_librispeech
the strange thing is that using the same configration in data2vec results 100% of GPU utilization
What's your environment?
fairseq Version (e.g., 1.0 or main):
Hydra 1.0.7
PyTorch Version (e.g., 1.0) '1.11.0+cu113'
OS (e.g., Linux):
Linux
How you installed fairseq (
pip
, source):source
Python version:
Python 3.8.12
release 11.3, V11.3.109