Closed dill-shower closed 1 year ago
have you installed ST with the installer?
have you installed ST with the installer?
Yes.
then I'm not sure what you mean, 8bit works fine with the files from this repo.
then I'm not sure what you mean, 8bit works fine with the files from this repo.
Without installed WSL?
then I'm not sure what you mean, 8bit works fine with the files from this repo.
Without installed WSL?
I believe so
OK. I will try to reinstall conda, ST and everything else
I was able to install all required packages and get StableTuner up and running. The installation script was run again and completed successfully. But when trying to run train StableTuner with Adam8bit enabled, it crashes with an error
Cudatoolkit is installed in conda env. Is wsl required to run StableTuner with 8bit adam? I found a similar issue in the repository https://github.com/d8ahazard/sd_dreambooth_extension/issues/3
IMPORTANT: when 8bit adam is disabled, training starts successfully. But OOM vram happens after the first steps (I have 15GB vram).
I was able to install all required packages and get StableTuner up and running. The installation script was run again and completed successfully. But when trying to run train StableTuner crashes with an error
accelerate "launch" "--mixed_precision=no" "scripts/trainer.py" "--model_variant=base" "--disable_cudnn_benchmark" "--sample_step_interval=500" "--pretrained_model_name_or_path=C:/StableTuner/models/wd-1-3-penultimate-ucg-cont" "--pretrained_vae_name_or_path=" "--output_dir=models/new_model" "--seed=3434554" "--resolution=512" "--train_batch_size=24" "--num_train_epochs=100" "--use_bucketing" "--aspect_mode=dynamic" "--aspect_mode_action_preference=add" "--use_8bit_adam" "--gradient_checkpointing" "--gradient_accumulation_steps=1" "--learning_rate=3e-6" "--lr_warmup_steps=0" "--lr_scheduler=constant" "--train_text_encoder" "--concepts_list=stabletune_concept_list.json" "--num_class_images=200" "--save_every_n_epoch=5" "--n_save_sample=1" "--sample_height=512" "--sample_width=512" "--dataset_repeats=1" "--sample_on_training_start" "--clip_penultimate" The following values were not passed to
main()
File "C:\diffusion\StableTuner\scripts\trainer.py", line 1530, in main
import bitsandbytes as bnb
File "C:\ProgramData\Anaconda3\lib\site-packages\bitsandbytes__init__.py", line 6, in
from .autograd._functions import (
File "C:\ProgramData\Anaconda3\lib\site-packages\bitsandbytes\autograd_functions.py", line 5, in
import bitsandbytes.functional as F
File "C:\ProgramData\Anaconda3\lib\site-packages\bitsandbytes\functional.py", line 13, in
from .cextension import COMPILED_WITH_CUDA, lib
File "C:\ProgramData\Anaconda3\lib\site-packages\bitsandbytes\cextension.py", line 118, in
raise RuntimeError('''
RuntimeError:
CUDA Setup failed despite GPU being available. Inspect the CUDA SETUP outputs aboveto fix your environment!
If you cannot find any issues and suspect a bug, please open an issue with detals about your environment:
https://github.com/TimDettmers/bitsandbytes/issues
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\Scripts\accelerate-script.py", line 9, in
sys.exit(main())
File "C:\ProgramData\Anaconda3\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main
args.func(args)
File "C:\ProgramData\Anaconda3\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command
simple_launcher(args)
File "C:\ProgramData\Anaconda3\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['C:\ProgramData\Anaconda3\python.exe', 'scripts/trainer.py', '--model_variant=base', '--disable_cudnn_benchmark', '--sample_step_interval=500', '--pretrained_model_name_or_path=C:/StableTuner/models/wd-1-3-penultimate-ucg-cont', '--pretrained_vae_name_or_path=', '--output_dir=models/new_model', '--seed=3434554', '--resolution=512', '--train_batch_size=24', '--num_train_epochs=100', '--use_bucketing', '--aspect_mode=dynamic', '--aspect_mode_action_preference=add', '--use_8bit_adam', '--gradient_checkpointing', '--gradient_accumulation_steps=1', '--learning_rate=3e-6', '--lr_warmup_steps=0', '--lr_scheduler=constant', '--train_text_encoder', '--concepts_list=stabletune_concept_list.json', '--num_class_images=200', '--save_every_n_epoch=5', '--n_save_sample=1', '--sample_height=512', '--sample_width=512', '--dataset_repeats=1', '--sample_on_training_start', '--clip_penultimate']' returned non-zero exit status 1.
accelerate launch
and had defaults used instead:--num_processes
was set to a value of1
--num_machines
was set to a value of1
--dynamo_backend
was set to a value of'no'
To avoid this warning pass in values for each of the problematic parameters or runaccelerate config
. Booting Up StableTuner Please wait a moment as we load up some stuff... C:\ProgramData\Anaconda3\lib\site-packages\accelerate\accelerator.py:321: UserWarning:log_with=tensorboard
was passed but no supported trackers are currently installed. warnings.warn(f"log_with={log_with}
was passed but no supported trackers are currently installed.") C:\ProgramData\Anaconda3\lib\site-packages\bitsandbytes\cextension.py:101: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {WindowsPath('C')} warn(msg) C:\ProgramData\Anaconda3\lib\site-packages\bitsandbytes\cextension.py:101: UserWarning: C:\ProgramData\Anaconda3\envs\ST did not contain libcudart.so as expected! Searching further paths... warn(msg) CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64... C:\ProgramData\Anaconda3\lib\site-packages\bitsandbytes\cextension.py:101: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {WindowsPath('/usr/local/cuda/lib64')} warn(msg) CUDA SETUP: WARNING! libcuda.so not found! Do you have a CUDA driver installed? If you are on a cluster, make sure you are on a CUDA machine! C:\ProgramData\Anaconda3\lib\site-packages\bitsandbytes\cextension.py:101: UserWarning: WARNING: No libcudart.so found! Install CUDA or the cudatoolkit package (anaconda)! warn(msg) C:\ProgramData\Anaconda3\lib\site-packages\bitsandbytes\cextension.py:101: UserWarning: WARNING: No GPU detected! Check your CUDA paths. Proceeding to load CPU-only library... warn(msg) CUDA SETUP: Loading binary C:\ProgramData\Anaconda3\lib\site-packages\bitsandbytes\libbitsandbytes_cpu.so... CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64... CUDA SETUP: WARNING! libcuda.so not found! Do you have a CUDA driver installed? If you are on a cluster, make sure you are on a CUDA machine! CUDA SETUP: Loading binary C:\ProgramData\Anaconda3\lib\site-packages\bitsandbytes\libbitsandbytes_cpu.so... CUDA SETUP: Problem: The main issue seems to be that the main CUDA library was not detected. CUDA SETUP: Solution 1): Your paths are probably not up-to-date. You can update them via: sudo ldconfig. CUDA SETUP: Solution 2): If you do not have sudo rights, you can do the following: CUDA SETUP: Solution 2a): Find the cuda library via: find / -name libcuda.so 2>/dev/null CUDA SETUP: Solution 2b): Once the library is found add it to the LD_LIBRARY_PATH: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:FOUND_PATH_FROM_2a CUDA SETUP: Solution 2c): For a permanent solution add the export from 2b into your .bashrc file, located at ~/.bashrc Traceback (most recent call last): File "C:\diffusion\StableTuner\scripts\trainer.py", line 2380, inCudatoolkit is installed in conda env. Is wsl required to run StableTuner? I found a similar issue in the bitsandbytes repository and the developer said that this libcudart.so is not supported on Windows
Os:windows 10