tenabraex commented 1 year ago

Just did an update and installed dadaptation and im getting this. I've included the full log from launch.

`FINDSTR: Cannot open .\logs\status\torch_version 13:50:57-088592 INFO nVidia toolkit detected 13:50:57-836011 INFO Torch 1.12.1+cu116 13:50:57-849014 INFO Torch backend: nVidia CUDA 11.6 cuDNN 8302 13:50:57-851014 INFO Torch detected GPU: NVIDIA GeForce RTX 2080 Ti VRAM 11264 Arch (7, 5) Cores 68 13:50:57-853014 INFO Validating that requirements are satisfied. 13:50:59-033410 INFO All requirements satisfied. 13:51:01-095437 INFO headless: False 13:51:01-098438 INFO Load CSS... Running on local URL: http://127.0.0.1:7860

To create a public link, set share=True in launch(). 13:51:10-214619 INFO Start training LoRA LyCORIS/LoCon ... 13:51:10-215619 INFO Folder 4_rabbit: 40 images found 13:51:10-217619 INFO Folder 4_rabbit: 160 steps 13:51:10-217619 INFO Total steps: 160 13:51:10-218621 INFO Train batch size: 4 13:51:10-219620 INFO Gradient accumulation steps: 1.0 13:51:10-220619 INFO Epoch: 20 13:51:10-221620 INFO Regulatization factor: 1 13:51:10-222620 INFO max_train_steps (160 / 4 / 1.0 20 1) = 800 13:51:10-223619 INFO stop_text_encoder_training = 0 13:51:10-224619 INFO lr_warmup_steps = 0 13:51:11-944621 INFO accelerate launch --num_cpu_threads_per_process=2 "train_network.py" --enable_bucket --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" --train_data_dir="C:/Users/bunnyviking/Desktop/AIimages/bunelemental/trainingMJ/img" --resolution=512,512 --output_dir="C:/Users/bunnyviking/Desktop/AIimages/bunelemental/trainingMJ" --logging_dir="C:/Users/bunnyviking/Desktop/AIimages/bunelemental/trainingMJ/logs" --network_alpha="16" --save_model_as=safetensors --network_module=lycoris.kohya --network_args "conv_dim=8" "conv_alpha=1" "algo=lora" --text_encoder_lr=5e-05 --unet_lr=0.0001 --network_dim=32 --output_name="dieselairTECHv1" --lr_scheduler_num_cycles="20" --learning_rate="0.0001" --lr_scheduler="cosine" --train_batch_size="4" --max_train_steps="800" --save_every_n_epochs="1" --mixed_precision="fp16" --save_precision="fp16" --seed="1234" --caption_extension=".txt" --cache_latents --optimizer_type="AdamW8bit" --max_data_loader_n_workers="0" --bucket_reso_steps=64 --xformers --bucket_no_upscale --wandb_api_key="False" prepare tokenizer Using DreamBooth method. prepare images. found directory C:\Users\bunnyviking\Desktop\AIimages\bunelemental\trainingMJ\img\4_rabbit contains 40 image files 160 train images with repeating. 0 reg images. no regularization images / 正則化画像が見つかりませんでした [Dataset 0] batch_size: 4 resolution: (512, 512) enable_bucket: True min_bucket_reso: 256 max_bucket_reso: 1024 bucket_reso_steps: 64 bucket_no_upscale: True

[Subset 0 of Dataset 0] image_dir: "C:\Users\bunnyviking\Desktop\AIimages\bunelemental\trainingMJ\img\4_rabbit" image_count: 40 num_repeats: 4 shuffle_caption: False keep_tokens: 0 caption_dropout_rate: 0.0 caption_dropout_every_n_epoches: 0 caption_tag_dropout_rate: 0.0 color_aug: False flip_aug: False face_crop_aug_range: None random_crop: False token_warmup_min: 1, token_warmup_step: 0, is_reg: False class_tokens: rabbit caption_extension: .txt

[Dataset 0] loading image sizes. 100%|████████████████████████████████████████████████████████████████████████████████| 40/40 [00:00<00:00, 4000.48it/s] make buckets min_bucket_reso and max_bucket_reso are ignored if bucket_no_upscale is set, because bucket reso is defined by image size automatically / bucket_no_upscaleが指定された場合は、bucketの解像度は画像サイズから自動計算されるため、min_bucket_resoとmax_bucket_resoは無視されます number of images (including repeats) / 各bucketの画像枚数（繰り返し回数を含む） bucket 0: resolution (512, 512), count: 160 mean ar error (without repeats): 0.0 preparing accelerator E:\kohya_SS\venv\lib\site-packages\accelerate\accelerator.py:249: FutureWarning: logging_dir is deprecated and will be removed in version 0.18.0 of 🤗 Accelerate. Use project_dir instead. warnings.warn( Using accelerator 0.15.0 or above. loading model for process 0/1 load Diffusers pretrained models: runwayml/stable-diffusion-v1-5 text_encoder\model.safetensors not found Fetching 19 files: 100%|███████████████████████████████████████████████████████████████████████| 19/19 [00:00<?, ?it/s] You have disabled the safety checker for <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline'> by passing safety_checker=None. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 . CrossAttention.forward has been replaced to enable xformers. import network module: lycoris.kohya [Dataset 0] caching latents. 100%|██████████████████████████████████████████████████████████████████████████████████| 40/40 [00:05<00:00, 7.63it/s] Using rank adaptation algo: lora Apply different lora dim for conv layer Conv Dim: 8, Linear Dim: 32 Apply different alpha value for conv layer Conv alpha: 1.0, Linear alpha: 16.0 Use Dropout value: 0.0 Create LyCORIS Module create LyCORIS for Text Encoder: 72 modules. Create LyCORIS Module create LyCORIS for U-Net: 278 modules. enable LyCORIS for text encoder enable LyCORIS for U-Net preparing optimizer, data loader etc. Deprecated: use prepare_optimizer_params(text_encoder_lr, unet_lr, learning_rate) instead of prepare_optimizer_params(text_encoder_lr, unet_lr)

viewform?usp=sf_link

CUDA SETUP: Loading binary E:\kohya_SS\venv\lib\site-packages\bitsandbytes\libbitsandbytes_cuda116.dll... use 8-bit AdamW optimizer | {} running training / 学習開始 num train images repeats / 学習画像の数×繰り返し回数: 160 num reg images / 正則化画像の数: 0 num batches per epoch / 1epochのバッチ数: 40 num epochs / epoch数: 20 batch size per device / バッチサイズ: 4 gradient accumulation steps / 勾配を合計するステップ数 = 1 total optimization steps / 学習ステップ数: 800 steps: 0%| | 0/800 [00:00<?, ?it/s]╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ E:\kohya_SS\train_network.py:814 in │ │ │ │ 811 │ args = parser.parse_args() │ │ 812 │ args = train_util.read_config_from_file(args, parser) │ │ 813 │ │ │ ❱ 814 │ train(args) │ │ 815 │ │ │ │ E:\kohya_SS\train_network.py:540 in train │ │ │ │ 537 │ │ beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear", num_train_tim │ │ 538 │ ) │ │ 539 │ if accelerator.is_main_process: │ │ ❱ 540 │ │ accelerator.init_trackers("network_train" if args.log_tracker_name is None else │ │ 541 │ │ │ 542 │ loss_list = [] │ │ 543 │ loss_total = 0.0 │ │ │ │ E:\kohya_SS\venv\lib\site-packages\accelerate\accelerator.py:548 in _inner │ │ │ │ 545 │ │ │ │ ) │ │ 546 │ │ │ │ 547 │ │ def _inner(args, kwargs): │ │ ❱ 548 │ │ │ return PartialState().on_main_process(function)(*args, kwargs) │ │ 549 │ │ │ │ 550 │ │ return _inner │ │ 551 │ │ │ │ E:\kohya_SS\venv\lib\site-packages\accelerate\accelerator.py:2031 in init_trackers │ │ │ │ 2028 │ │ │ │ if getattr(tracker_init, "requires_logging_directory"): │ │ 2029 │ │ │ │ │ # We can skip this check since it was done in __init__ │ │ 2030 │ │ │ │ │ self.trackers.append( │ │ ❱ 2031 │ │ │ │ │ │ tracker_init(project_name, self.logging_dir, init_kwargs.get(s │ │ 2032 │ │ │ │ │ ) │ │ 2033 │ │ │ │ else: │ │ 2034 │ │ │ │ │ self.trackers.append(tracker_init(project_name, *init_kwargs.get(st │ │ │ │ E:\kohya_SS\venv\lib\site-packages\accelerate\tracking.py:83 in execute_on_main_process │ │ │ │ 80 │ @wraps(function) │ │ 81 │ def execute_on_main_process(self, args, kwargs): │ │ 82 │ │ if getattr(self, "main_process_only", False): │ │ ❱ 83 │ │ │ return PartialState().on_main_process(function)(self, *args, *kwargs) │ │ 84 │ │ else: │ │ 85 │ │ │ return function(self, args, kwargs) │ │ 86 │ │ │ │ E:\kohya_SS\venv\lib\site-packages\accelerate\tracking.py:190 in init │ │ │ │ 187 │ │ super().init() │ │ 188 │ │ self.run_name = run_name │ │ 189 │ │ self.logging_dir = os.path.join(logging_dir, run_name) │ │ ❱ 190 │ │ self.writer = tensorboard.SummaryWriter(self.logging_dir, kwargs) │ │ 191 │ │ logger.debug(f"Initialized TensorBoard project {self.run_name} logging to {self. │ │ 192 │ │ logger.debug( │ │ 193 │ │ │ "Make sure to log any initial configurations with self.store_init_configura │ │ │ │ E:\kohya_SS\venv\lib\site-packages\torch\utils\tensorboard\writer.py:246 in __init__ │ │ │ │ 243 │ │ # Initialize the file writers, but they can be cleared out on close │ │ 244 │ │ # and recreated later as needed. │ │ 245 │ │ self.file_writer = self.all_writers = None │ │ ❱ 246 │ │ self._get_file_writer() │ │ 247 │ │ │ │ 248 │ │ # Create default bins for histograms, see generate_testdata.py in tensorflow/ten │ │ 249 │ │ v = 1e-12 │ │ │ │ E:\kohya_SS\venv\lib\site-packages\torch\utils\tensorboard\writer.py:276 in _get_file_writer │ │ │ │ 273 │ def _get_file_writer(self): │ │ 274 │ │ """Returns the default FileWriter instance. Recreates it if closed.""" │ │ 275 │ │ if self.all_writers is None or self.file_writer is None: │ │ ❱ 276 │ │ │ self.file_writer = FileWriter( │ │ 277 │ │ │ │ self.log_dir, self.max_queue, self.flush_secs, self.filename_suffix │ │ 278 │ │ │ ) │ │ 279 │ │ │ self.all_writers = {self.file_writer.get_logdir(): self.file_writer} │ │ │ │ E:\kohya_SS\venv\lib\site-packages\torch\utils\tensorboard\writer.py:75 in __init__ │ │ │ │ 72 │ │ # TODO: See if we can remove this in the future if we are │ │ 73 │ │ # actually the ones passing in a PosixPath │ │ 74 │ │ log_dir = str(log_dir) │ │ ❱ 75 │ │ self.event_writer = EventFileWriter( │ │ 76 │ │ │ log_dir, max_queue, flush_secs, filename_suffix │ │ 77 │ │ ) │ │ 78 │ │ │ │ E:\kohya_SS\venv\lib\site-packages\tensorboard\summary\writer\event_file_writer.py:72 in │ │ __init__ │ │ │ │ 69 │ │ │ pending events and summaries to disk. │ │ 70 │ │ """ │ │ 71 │ │ self._logdir = logdir │ │ ❱ 72 │ │ tf.io.gfile.makedirs(logdir) │ │ 73 │ │ self._file_name = ( │ │ 74 │ │ │ os.path.join( │ │ 75 │ │ │ │ logdir, │ │ │ │ E:\kohya_SS\venv\lib\site-packages\tensorboard\lazy.py:65 in __getattr__ │ │ │ │ 62 │ │ # over load_once() and avoid polluting the module's attrs with our own state. │ │ 63 │ │ class LazyModule(types.ModuleType): │ │ 64 │ │ │ def __getattr__(self, attr_name): │ │ ❱ 65 │ │ │ │ return getattr(load_once(self), attr_name) │ │ 66 │ │ │ │ │ 67 │ │ │ def __dir__(self): │ │ 68 │ │ │ │ return dir(load_once(self)) │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ AttributeError: module 'tensorflow' has no attribute 'io' steps: 0%| | 0/800 [00:00<?, ?it/s] ╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ C:\Users\bunnyviking\AppData\Local\Programs\Python\Python310\lib\runpy.py:196 in │ │ _run_module_as_main │ │ │ │ 193 │ main_globals = sys.modules["__main__"].__dict__ │ │ 194 │ if alter_argv: │ │ 195 │ │ sys.argv[0] = mod_spec.origin │ │ ❱ 196 │ return _run_code(code, main_globals, None, │ │ 197 │ │ │ │ │ "__main__", mod_spec) │ │ 198 │ │ 199 def run_module(mod_name, init_globals=None, │ │ │ │ C:\Users\bunnyviking\AppData\Local\Programs\Python\Python310\lib\runpy.py:86 in _run_code │ │ │ │ 83 │ │ │ │ │ __loader__ = loader, │ │ 84 │ │ │ │ │ __package__ = pkg_name, │ │ 85 │ │ │ │ │ __spec__ = mod_spec) │ │ ❱ 86 │ exec(code, run_globals) │ │ 87 │ return run_globals │ │ 88 │ │ 89 def _run_module_code(code, init_globals=None, │ │ │ │ in <module>:7 │ │ │ │ 4 from accelerate.commands.accelerate_cli import main │ │ 5 if __name__ == '__main__': │ │ 6 │ sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0]) │ │ ❱ 7 │ sys.exit(main()) │ │ 8 │ │ │ │ E:\kohya_SS\venv\lib\site-packages\accelerate\commands\accelerate_cli.py:45 in main │ │ │ │ 42 │ │ exit(1) │ │ 43 │ │ │ 44 │ # Run │ │ ❱ 45 │ args.func(args) │ │ 46 │ │ 47 │ │ 48 if __name__ == "__main__": │ │ │ │ E:\kohya_SS\venv\lib\site-packages\accelerate\commands\launch.py:923 in launch_command │ │ │ │ 920 │ elif defaults is not None and defaults.compute_environment == ComputeEnvironment.AMA │ │ 921 │ │ sagemaker_launcher(defaults, args) │ │ 922 │ else: │ │ ❱ 923 │ │ simple_launcher(args) │ │ 924 │ │ 925 │ │ 926 def main(): │ │ │ │ E:\kohya_SS\venv\lib\site-packages\accelerate\commands\launch.py:579 in simple_launcher │ │ │ │ 576 │ process.wait() │ │ 577 │ if process.returncode != 0: │ │ 578 │ │ if not args.quiet: │ │ ❱ 579 │ │ │ raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) │ │ 580 │ │ else: │ │ 581 │ │ │ sys.exit(1) │ │ 582 │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ CalledProcessError: Command '['E:\\kohya_SS\\venv\\Scripts\\python.exe', 'train_network.py', '--enable_bucket', '--pretrained_model_name_or_path=runwayml/stable-diffusion-v1-5', '--train_data_dir=C:/Users/bunnyviking/Desktop/AIimages/bunelemental/trainingMJ/img', '--resolution=512,512', '--output_dir=C:/Users/bunnyviking/Desktop/AIimages/bunelemental/trainingMJ', '--logging_dir=C:/Users/bunnyviking/Desktop/AIimages/bunelemental/trainingMJ/logs', '--network_alpha=16', '--save_model_as=safetensors', '--network_module=lycoris.kohya', '--network_args', 'conv_dim=8', 'conv_alpha=1', 'algo=lora', '--text_encoder_lr=5e-05', '--unet_lr=0.0001', '--network_dim=32', '--output_name=dieselairTECHv1', '--lr_scheduler_num_cycles=20', '--learning_rate=0.0001', '--lr_scheduler=cosine', '--train_batch_size=4', '--max_train_steps=800', '--save_every_n_epochs=1', '--mixed_precision=fp16', '--save_precision=fp16', '--seed=1234', '--caption_extension=.txt', '--cache_latents', '--optimizer_type=AdamW8bit', '--max_data_loader_n_workers=0', '--bucket_reso_steps=64', '--xformers', '--bucket_no_upscale', '--wandb_api_key=False']' returned non-zero exit status 1.

tenabraex commented 1 year ago

note - i have manually reinstalled both tensorboard and tensorflow to the versions required by kohya_ss to run and this has not helped

ToneC137 commented 1 year ago

I have exactly same issue after update. Im going to wait a few more hours to see if another update will fix this issue before attempting to reinstall

BunnyViking commented 1 year ago

I did a separate install and it works fine (and with adaptations) I'm guessing maybe the requirements file wasn't updated right or something like that.

ToneC137 commented 1 year ago

I did a separate install and it works fine (and with adaptations) I'm guessing maybe the requirements file wasn't updated right or something like that.

when you say a separate install...can you briefly explain the process? Im a novice at some of this

tenabraex commented 1 year ago

I did a separate install and it works fine (and with adaptations) I'm guessing maybe the requirements file wasn't updated right or something like that.

when you say a separate install...can you briefly explain the process? Im a novice at some of this

same as doing a reinstall but in a different directory

because each install sets up its own virtual environment they won't affect each other (that's just something the install scripts do for you)

yanhuifair commented 1 year ago

same

bmaltais commented 1 year ago

So this might be related to some modules not being properly updated... Python module updates can be a bit of a pain. Deleting the venv folder and redoing the setup usually cure most issues.

sumire608 commented 1 year ago

Edit requirements_windows_torch2.txt to this: tensorboard==2.13.0 tensorflow==2.13.0rc1

Will solve this issue.

dtxn commented 1 year ago

Edit requirement.txt to this: tensorboard==2.13.0 tensorflow==2.13.0rc1

Will solve this issue.

Hi. I see 3 lines with tensorboard. Which lines I should change ?

accelerate==0.15.0 albumentations==1.3.0 altair==4.2.2 bitsandbytes==0.35.0 dadaptation==3.1 diffusers[torch]==0.10.2 easygui==0.98.3 einops==0.6.0 fairscale==0.4.13 ftfy==6.1.1 gradio==3.23.0; sys_platform == 'darwin' gradio==3.32.0; sys_platform != 'darwin' huggingface-hub==0.13.0; sys_platform == 'darwin' huggingface-hub==0.13.3; sys_platform != 'darwin' lion-pytorch==0.0.6 lycoris_lora==0.1.4 opencv-python==4.7.0.68 pytorch-lightning==1.9.0 rich==13.4.1 safetensors==0.2.6 tensorboard==2.10.1 ; sys_platform != 'darwin' tensorboard==2.12.1 ; sys_platform == 'darwin' tensorflow==2.10.1; sys_platform != 'darwin' timm==0.6.12 tk==0.1.0 toml==0.10.2 transformers==4.26.0 voluptuous==0.13.1 wandb==0.15.0

for kohya_ss library

.

bmaltais commented 1 year ago

Try deleting the full venv folder and reinstall. I will have a new setup solution that should minimize this risk in the future.

sumire608 commented 1 year ago

dtxn

sorry, should be requirements_windows_torch2.txt.😵‍💫 Then run upgrade.bat.

dtxn commented 1 year ago

Try deleting the full venv folder and reinstall. I will have a new setup solution that should minimize this risk in the future.

Seems that it helped, thank you. I'm a newbie so, not sure if I did exactly what u mean by word "reinstall": I deleted venv folder, run upgrade.bat, then setup.bat and after all everything worked ok. Still get one error when starting, but hopefully it's nothing bad:

dtxn commented 1 year ago

dtxn

sorry, should be requirements_windows_torch2.txt.😵‍💫 Then run upgrade.bat.

Already too late, I reinstalled :D But thank you for clarification :)

tenabraex commented 1 year ago

Edit requirements_windows_torch2.txt to this: tensorboard==2.13.0 tensorflow==2.13.0rc1

Will solve this issue.

cheers, good to know I was on the right track, im obviously learning :D Just didn't know what to change.

tenabraex commented 1 year ago

https://github.com/bmaltais/kohya_ss/issues/904#issuecomment-1575566568

hypervoxel commented 9 months ago

@sumire608 I'm having a similar issue and I don't seem to have an upgrade.bat in my kohya_ss folder I upgraded with git pull, but would like to try upgrade.bat Does anyone have the file?

dtxn

sorry, should be requirements_windows_torch2.txt.😵‍💫 Then run upgrade.bat.

bmaltais / kohya_ss

AttributeError: module 'tensorflow' has no attribute 'io' #904

for kohya_ss library