devilismyfriend / StableTuner

Finetuning SD in style.
GNU Affero General Public License v3.0
670 stars 51 forks source link

8bit Adam not working on windows without wsl #51

Closed dill-shower closed 1 year ago

dill-shower commented 1 year ago

I was able to install all required packages and get StableTuner up and running. The installation script was run again and completed successfully. But when trying to run train StableTuner with Adam8bit enabled, it crashes with an error

Cudatoolkit is installed in conda env. Is wsl required to run StableTuner with 8bit adam? I found a similar issue in the repository https://github.com/d8ahazard/sd_dreambooth_extension/issues/3

IMPORTANT: when 8bit adam is disabled, training starts successfully. But OOM vram happens after the first steps (I have 15GB vram).

I was able to install all required packages and get StableTuner up and running. The installation script was run again and completed successfully. But when trying to run train StableTuner crashes with an error

accelerate "launch" "--mixed_precision=no" "scripts/trainer.py" "--model_variant=base" "--disable_cudnn_benchmark" "--sample_step_interval=500" "--pretrained_model_name_or_path=C:/StableTuner/models/wd-1-3-penultimate-ucg-cont" "--pretrained_vae_name_or_path=" "--output_dir=models/new_model" "--seed=3434554" "--resolution=512" "--train_batch_size=24" "--num_train_epochs=100" "--use_bucketing" "--aspect_mode=dynamic" "--aspect_mode_action_preference=add" "--use_8bit_adam" "--gradient_checkpointing" "--gradient_accumulation_steps=1" "--learning_rate=3e-6" "--lr_warmup_steps=0" "--lr_scheduler=constant" "--train_text_encoder" "--concepts_list=stabletune_concept_list.json" "--num_class_images=200" "--save_every_n_epoch=5" "--n_save_sample=1" "--sample_height=512" "--sample_width=512" "--dataset_repeats=1" "--sample_on_training_start" "--clip_penultimate" The following values were not passed to accelerate launch and had defaults used instead: --num_processes was set to a value of 1 --num_machines was set to a value of 1 --dynamo_backend was set to a value of 'no' To avoid this warning pass in values for each of the problematic parameters or run accelerate config. Booting Up StableTuner Please wait a moment as we load up some stuff... C:\ProgramData\Anaconda3\lib\site-packages\accelerate\accelerator.py:321: UserWarning: log_with=tensorboard was passed but no supported trackers are currently installed. warnings.warn(f"log_with={log_with} was passed but no supported trackers are currently installed.") C:\ProgramData\Anaconda3\lib\site-packages\bitsandbytes\cextension.py:101: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {WindowsPath('C')} warn(msg) C:\ProgramData\Anaconda3\lib\site-packages\bitsandbytes\cextension.py:101: UserWarning: C:\ProgramData\Anaconda3\envs\ST did not contain libcudart.so as expected! Searching further paths... warn(msg) CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64... C:\ProgramData\Anaconda3\lib\site-packages\bitsandbytes\cextension.py:101: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {WindowsPath('/usr/local/cuda/lib64')} warn(msg) CUDA SETUP: WARNING! libcuda.so not found! Do you have a CUDA driver installed? If you are on a cluster, make sure you are on a CUDA machine! C:\ProgramData\Anaconda3\lib\site-packages\bitsandbytes\cextension.py:101: UserWarning: WARNING: No libcudart.so found! Install CUDA or the cudatoolkit package (anaconda)! warn(msg) C:\ProgramData\Anaconda3\lib\site-packages\bitsandbytes\cextension.py:101: UserWarning: WARNING: No GPU detected! Check your CUDA paths. Proceeding to load CPU-only library... warn(msg) CUDA SETUP: Loading binary C:\ProgramData\Anaconda3\lib\site-packages\bitsandbytes\libbitsandbytes_cpu.so... CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64... CUDA SETUP: WARNING! libcuda.so not found! Do you have a CUDA driver installed? If you are on a cluster, make sure you are on a CUDA machine! CUDA SETUP: Loading binary C:\ProgramData\Anaconda3\lib\site-packages\bitsandbytes\libbitsandbytes_cpu.so... CUDA SETUP: Problem: The main issue seems to be that the main CUDA library was not detected. CUDA SETUP: Solution 1): Your paths are probably not up-to-date. You can update them via: sudo ldconfig. CUDA SETUP: Solution 2): If you do not have sudo rights, you can do the following: CUDA SETUP: Solution 2a): Find the cuda library via: find / -name libcuda.so 2>/dev/null CUDA SETUP: Solution 2b): Once the library is found add it to the LD_LIBRARY_PATH: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:FOUND_PATH_FROM_2a CUDA SETUP: Solution 2c): For a permanent solution add the export from 2b into your .bashrc file, located at ~/.bashrc Traceback (most recent call last): File "C:\diffusion\StableTuner\scripts\trainer.py", line 2380, in main() File "C:\diffusion\StableTuner\scripts\trainer.py", line 1530, in main import bitsandbytes as bnb File "C:\ProgramData\Anaconda3\lib\site-packages\bitsandbytes__init__.py", line 6, in from .autograd._functions import ( File "C:\ProgramData\Anaconda3\lib\site-packages\bitsandbytes\autograd_functions.py", line 5, in import bitsandbytes.functional as F File "C:\ProgramData\Anaconda3\lib\site-packages\bitsandbytes\functional.py", line 13, in from .cextension import COMPILED_WITH_CUDA, lib File "C:\ProgramData\Anaconda3\lib\site-packages\bitsandbytes\cextension.py", line 118, in raise RuntimeError(''' RuntimeError: CUDA Setup failed despite GPU being available. Inspect the CUDA SETUP outputs aboveto fix your environment! If you cannot find any issues and suspect a bug, please open an issue with detals about your environment: https://github.com/TimDettmers/bitsandbytes/issues Traceback (most recent call last): File "C:\ProgramData\Anaconda3\Scripts\accelerate-script.py", line 9, in sys.exit(main()) File "C:\ProgramData\Anaconda3\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main args.func(args) File "C:\ProgramData\Anaconda3\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command simple_launcher(args) File "C:\ProgramData\Anaconda3\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['C:\ProgramData\Anaconda3\python.exe', 'scripts/trainer.py', '--model_variant=base', '--disable_cudnn_benchmark', '--sample_step_interval=500', '--pretrained_model_name_or_path=C:/StableTuner/models/wd-1-3-penultimate-ucg-cont', '--pretrained_vae_name_or_path=', '--output_dir=models/new_model', '--seed=3434554', '--resolution=512', '--train_batch_size=24', '--num_train_epochs=100', '--use_bucketing', '--aspect_mode=dynamic', '--aspect_mode_action_preference=add', '--use_8bit_adam', '--gradient_checkpointing', '--gradient_accumulation_steps=1', '--learning_rate=3e-6', '--lr_warmup_steps=0', '--lr_scheduler=constant', '--train_text_encoder', '--concepts_list=stabletune_concept_list.json', '--num_class_images=200', '--save_every_n_epoch=5', '--n_save_sample=1', '--sample_height=512', '--sample_width=512', '--dataset_repeats=1', '--sample_on_training_start', '--clip_penultimate']' returned non-zero exit status 1.

Cudatoolkit is installed in conda env. Is wsl required to run StableTuner? I found a similar issue in the bitsandbytes repository and the developer said that this libcudart.so is not supported on Windows

Os:windows 10

devilismyfriend commented 1 year ago

have you installed ST with the installer?

dill-shower commented 1 year ago

have you installed ST with the installer?

Yes.

devilismyfriend commented 1 year ago

then I'm not sure what you mean, 8bit works fine with the files from this repo.

dill-shower commented 1 year ago

then I'm not sure what you mean, 8bit works fine with the files from this repo.

Without installed WSL?

devilismyfriend commented 1 year ago

then I'm not sure what you mean, 8bit works fine with the files from this repo.

Without installed WSL?

I believe so

dill-shower commented 1 year ago

OK. I will try to reinstall conda, ST and everything else