Incompatible CUDA versions in torch and torchvision

thekeeno commented 1 year ago

Describe the bug

Running the program in its default condition seems to install incompatible versions of torch and torchvision, so you can't start training at all.

Reproduction

Run cells of DreamBooth_Stable_Diffusion.ipynb from top down.

Logs

WARNING:accelerate.commands.launch:The following values were not passed to `accelerate launch` and had defaults used instead:
    `--num_processes` was set to a value of `1`
    `--num_machines` was set to a value of `1`
    `--mixed_precision` was set to a value of `'no'`
    `--dynamo_backend` was set to a value of `'no'`
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
Traceback (most recent call last):
  File "train_dreambooth.py", line 24, in <module>
    from torchvision import transforms
  File "/usr/local/lib/python3.8/dist-packages/torchvision/__init__.py", line 5, in <module>
    from torchvision import datasets, io, models, ops, transforms, utils
  File "/usr/local/lib/python3.8/dist-packages/torchvision/datasets/__init__.py", line 1, in <module>
    from ._optical_flow import FlyingChairs, FlyingThings3D, HD1K, KittiFlow, Sintel
  File "/usr/local/lib/python3.8/dist-packages/torchvision/datasets/_optical_flow.py", line 11, in <module>
    from ..io.image import _read_png_16
  File "/usr/local/lib/python3.8/dist-packages/torchvision/io/__init__.py", line 8, in <module>
    from ._load_gpu_decoder import _HAS_GPU_VIDEO_DECODER
  File "/usr/local/lib/python3.8/dist-packages/torchvision/io/_load_gpu_decoder.py", line 1, in <module>
    from ..extension import _load_library
  File "/usr/local/lib/python3.8/dist-packages/torchvision/extension.py", line 107, in <module>
    _check_cuda_version()
  File "/usr/local/lib/python3.8/dist-packages/torchvision/extension.py", line 80, in _check_cuda_version
    raise RuntimeError(
RuntimeError: Detected that PyTorch and torchvision were compiled with different CUDA versions. PyTorch has CUDA Version=11.7 and torchvision has CUDA Version=11.6. Please reinstall the torchvision that matches your PyTorch install.
Traceback (most recent call last):
  File "/usr/local/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/accelerate_cli.py", line 45, in main
    args.func(args)
  File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/launch.py", line 1104, in launch_command
    simple_launcher(args)
  File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/launch.py", line 567, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', 'train_dreambooth.py', '--pretrained_model_name_or_path=runwayml/stable-diffusion-v1-5', '--pretrained_vae_name_or_path=stabilityai/sd-vae-ft-mse', '--output_dir=/content/drive/MyDrive/stable_diffusion_weights/zwx', '--revision=fp16', '--with_prior_preservation', '--prior_loss_weight=1.0', '--seed=1337', '--resolution=512', '--train_batch_size=1', '--train_text_encoder', '--mixed_precision=fp16', '--use_8bit_adam', '--gradient_accumulation_steps=1', '--learning_rate=1e-6', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--num_class_images=25', '--sample_batch_size=12', '--max_train_steps=800', '--save_interval=10000', '--save_sample_prompt=photo of zwx woman', '--concepts_list=concepts_list.json']' returned non-zero exit status 1.

System Info

diffusers version: 0.9.0
Platform: Linux-5.10.147+-x86_64-with-glibc2.27
Python version: 3.8.16
PyTorch version (GPU?): 1.13.0+cu117 (True)
Huggingface_hub version: 0.11.1
Transformers version: 4.25.1
Using GPU in script?: yes
Using distributed or parallel set-up in script?: yes

CRCODE22 commented 1 year ago

I have the same problem with google colab it worked fine before but not anymore.

It goes wrong in this section:

Install xformers from precompiled wheel.

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. torchvision 0.14.1+cu116 requires torch==1.13.1, but you have torch 1.13.0 which is incompatible. torchtext 0.14.1 requires torch==1.13.1, but you have torch 1.13.0 which is incompatible. torchaudio 0.13.1+cu116 requires torch==1.13.1, but you have torch 1.13.0 which is incompatible.

digiderk commented 1 year ago

I'm experiencing the same issue as well.

rmac85 commented 1 year ago

Prob the 3rd or 4th time this has happened. The Transformers wheel updates with little regard for current common settings and messes everything up in it's path. I guess it is useful, but poorly implemented. Usually you can just compile the wheel manually and takes about 40 mins to and hour, but been trying to compile now on an A100 for over 60 minutes now. Cost me at least $2-3 in compute units, I just hope it completes.

ShivamShrirao commented 1 year ago

Fixed in 8b1472ffd0c8e0144f9db797e545eb908a1831b9 This was caused by pytorch version update in colab and xformers package reinstalling older version. No need to compile it.

ShivamShrirao / diffusers