ShivamShrirao / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch
https://huggingface.co/docs/diffusers
Apache License 2.0
1.89k stars 505 forks source link

Dreambooth in colab no longer working correctly (CUDA_SETUP error) #180

Closed tchesket closed 1 year ago

tchesket commented 1 year ago

Describe the bug

I used dreambooth in colab successfully just ~24 hours ago, something changed and now it will still run to completion but the end results are terrible, even using the same exact settings for training. I'm getting an error when running the main accelerate block, something relating to bitsandbytes CUDA_SETUP and a bunch of references to PosixPath (not sure what that means). Also I'm not positive, but it doesn't seem like xformers is installing correctly either, it used to take a little while for that cell to run but now it finishes almost instantaneously (after downloading). Not sure if related.

Provided python error below.

Reproduction

Run the colab cells with any valid settings and this happens now.

Logs

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
For effortless bug reporting copy-paste your error into this form: https://docs.google.com/forms/d/e/1FAIpQLScPB8emS3Thkp66nvqwmjTEgxp8Y9ufuWTzFyr9kJ5AoI47dQ/viewform?usp=sf_link
================================================================================
/usr/local/lib/python3.8/dist-packages/bitsandbytes/cuda_setup/paths.py:105: UserWarning: /usr/lib64-nvidia did not contain libcudart.so as expected! Searching further paths...
  warn(
/usr/local/lib/python3.8/dist-packages/bitsandbytes/cuda_setup/paths.py:27: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('--listen_host=172.28.0.12 --target_host=172.28.0.12 --tunnel_background_save_url=https'), PosixPath('//colab.research.google.com/tun/m/cc48301118ce562b961b3c22d803539adc1e0c19/gpu-t4-s-1bdr3emxpwnxd --tunnel_background_save_delay=10s --tunnel_periodic_background_save_frequency=30m0s --enable_output_coalescing=true --output_coalescing_required=true')}
  warn(
/usr/local/lib/python3.8/dist-packages/bitsandbytes/cuda_setup/paths.py:27: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('6000,"kernelManagerProxyHost"'), PosixPath('true}'), PosixPath('"172.28.0.12","jupyterArgs"'), PosixPath('"/usr/local/bin/dap_multiplexer","enableLsp"'), PosixPath('{"kernelManagerProxyPort"'), PosixPath('["--ip=172.28.0.12","--transport=ipc"],"debugAdapterMultiplexerPath"')}
  warn(
/usr/local/lib/python3.8/dist-packages/bitsandbytes/cuda_setup/paths.py:27: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('module'), PosixPath('//ipykernel.pylab.backend_inline')}
  warn(
/usr/local/lib/python3.8/dist-packages/bitsandbytes/cuda_setup/paths.py:27: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/env/python')}
  warn(
/usr/local/lib/python3.8/dist-packages/bitsandbytes/cuda_setup/paths.py:27: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/sys/fs/cgroup/memory.events /var/colab/cgroup/jupyter-children/memory.events')}
  warn(
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 7.5
CUDA SETUP: Detected CUDA version 112
CUDA SETUP: Loading binary /usr/local/lib/python3.8/dist-packages/bitsandbytes/libbitsandbytes_cuda112.so...
/usr/local/lib/python3.8/dist-packages/diffusers/utils/deprecation_utils.py:35: FutureWarning: It is deprecated to pass a pretrained model name or path to `from_config`.If you were trying to load a scheduler, please use <class 'diffusers.schedulers.scheduling_ddpm.DDPMScheduler'>.from_pretrained(...) instead. Otherwise, please make sure to pass a configuration dictionary instead. This functionality will be removed in v1.0.0.
  warnings.warn(warning + message, FutureWarning)
Steps:   0% 0/11400 [00:00<?, ?it/s]

System Info

OMGhozlan commented 1 year ago

I'm not sure if it's related, but I think some changes have been made to the original diffusers repo. One of them includes the keep_fp32_wrapper. It has been removed in the newer version. After training, it failed to save because the argument is no longer used

tchesket commented 1 year ago

Not sure, I did not get the keep_fp32_wrapper error as far as I can remember. It did not fail to save the checkpoint for me, the results are just garbage. It looks distorted and like it's severely overtrained, even with settings that were working perfectly fine days ago.

mroxso commented 1 year ago

but im getting the keep_fp32_wrapper error. how to solve this?

ShivamShrirao commented 1 year ago

For fp32 wrapper error you need to update accelerate.

tchesket commented 1 year ago

To be clear, I haven't encountered the fp32 wrapper error. This is something different

tchesket commented 1 year ago

Turns out there was some kind of corruption happening with most of the png files I was using for instance images, no idea what/how but it seems to be working, despite the CUDA_SETUP warning. Closing

ZeroCool22 commented 1 year ago

For fp32 wrapper error you need to update accelerate.

Where i should run this command: pip install --upgrade accelerate

In ~/github/diffusers/examples/dreambooth$ or ~/github/diffusers ?