Open jlest01 opened 1 month ago
Also tried with the Torch version 2.6.0.dev20240914+cu124
, but got the same error.
With the latest stable version of Torch, that error is gone, but the log is stuck on the lines below (for hours):
[2024-09-14 19:21:39] [INFO] epoch 1/16
[2024-09-14 19:21:39] [INFO] huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
[2024-09-14 19:21:39] [INFO] To disable this warning, you can either:
[2024-09-14 19:21:39] [INFO] - Avoid using `tokenizers` before the fork if possible
[2024-09-14 19:21:39] [INFO] - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
[2024-09-14 19:21:39] [INFO] huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
[2024-09-14 19:21:39] [INFO] To disable this warning, you can either:
[2024-09-14 19:21:39] [INFO] - Avoid using `tokenizers` before the fork if possible
[2024-09-14 19:21:39] [INFO] - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
[2024-09-14 19:21:39] [INFO] INFO epoch is incremented. current_epoch: 0, epoch: 1 train_util.py:672
[2024-09-14 19:21:39] [INFO] INFO epoch is incremented. current_epoch: 0, epoch: 1 train_util.py:672
[2024-09-14 19:21:46] [INFO] /home/user/fluxgym/env/lib/python3.12/site-packages/torch/utils/checkpoint.py:1399: FutureWarning: `torch.cpu.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cpu', args...)` instead.
[2024-09-14 19:21:46] [INFO] with device_autocast_ctx, torch.cpu.amp.autocast(**cpu_autocast_kwargs), recompute_context: # type: ignore[attr-defined]
I also had this problem, but solved it. I am using CUDA version 12.1, and installed all the requirements as listed but for the pytorch only did this:
pip install --pre torch torchvision torchaudio
@adeerkhan would you like to share how you fixed it? Or is your second comment the solution you found?
In an, admittedly, very different use-case (I am training Dreambooth models with the diffusers example scripts) I was able to resolve this by updating to cuda 12.6 and the "devel" branch of cudnn, as described this Stackoverflow post.
I am getting the error below right after starting the training.
I have an Nvidia 4070 TI 12GB GPU and followed all the manual installation steps correctly. The script is running with the 'venv' environment activated, and both the required dependencies and Nightly PyTorch are installed.
Other settings:
Torch version:
torch==2.6.0.dev20240914+cu121
OS:
Ubuntu 24.01