bmaltais / kohya_ss

Apache License 2.0
9.43k stars 1.22k forks source link

Cannot Train After Update Today #191

Closed Rika-Mipa closed 1 year ago

Rika-Mipa commented 1 year ago

MY PC: RTX 4090+WIN10 21H2 Python 3.10.9

Yesterday,the soft worked very well. However, when i update to the latest version today, it can not train any more. I delete the folder, and install the software from first step. However, it still couldn't work. Please help me.


Load CSS... Running on local URL: http://127.0.0.1:7860

To create a public link, set share=True in launch(). Loading config... Folder 7_Aharen: 567 steps max_train_steps = 11340 stop_text_encoder_training = 0 lr_warmup_steps = 567 accelerate launch --num_cpu_threads_per_process=32 "train_network.py" --enable_bucket --pretrained_model_name_or_path="D:/NovelAI/models/Stable-diffusion/latest.ckpt" --train_data_dir="E:/Train/Aharen" --resolution=512,512 --output_dir="E:/Train/Aharen" --logging_dir="" --network_alpha="8" --save_model_as=safetensors --network_module=networks.lora --text_encoder_lr=3e-5 --unet_lr=3e-4 --network_dim=32 --output_name="test" --lr_scheduler_num_cycles="20" --learning_rate="1e-5" --lr_scheduler="cosine_with_restarts" --lr_warmup_steps="567" --train_batch_size="1" --max_train_steps="11340" --save_every_n_epochs="1" --mixed_precision="fp16" --save_precision="fp16" --seed="31337" --caption_extension=".txt" --cache_latents --clip_skip=2 --keep_tokens="3" --bucket_reso_steps=64 --shuffle_caption --xformers --use_8bit_adam --bucket_no_upscale prepare tokenizer Use DreamBooth method. prepare train images. found directory 7_Aharen contains 81 image files 567 train images with repeating. loading image sizes. 100%|███████████████████████████████████████████████████████████████████████████████| 81/81 [00:00<00:00, 10125.13it/s] make buckets min_bucket_reso and max_bucket_reso are ignored if bucket_no_upscale is set, because bucket reso is defined by image size automatically / bucket_no_upscaleが指定された場合は、bucketの解像度は画像サイズから自動計算されるため、min_bucket_resoとmax_bucket_resoは無視されます number of images (including repeats) / 各bucketの画像枚数(繰り返し回数を含む) bucket 0: resolution (512, 384), count: 567 mean ar error (without repeats): 0.0 prepare accelerator Using accelerator 0.15.0 or above. load StableDiffusion checkpoint loading u-net: loading vae: Traceback (most recent call last): File "D:\LORA\kohya_ss\venv\lib\site-packages\urllib3\connectionpool.py", line 700, in urlopen self._prepare_proxy(conn) File "D:\LORA\kohya_ss\venv\lib\site-packages\urllib3\connectionpool.py", line 996, in _prepare_proxy conn.connect() File "D:\LORA\kohya_ss\venv\lib\site-packages\urllib3\connection.py", line 414, in connect self.sock = ssl_wrap_socket( File "D:\LORA\kohyass\venv\lib\site-packages\urllib3\util\ssl.py", line 449, in ssl_wrap_socket ssl_sock = _ssl_wrap_socket_impl( File "D:\LORA\kohyass\venv\lib\site-packages\urllib3\util\ssl.py", line 493, in _ssl_wrap_socket_impl return ssl_context.wrap_socket(sock, server_hostname=server_hostname) File "D:\Python\lib\ssl.py", line 513, in wrap_socket return self.sslsocket_class._create( File "D:\Python\lib\ssl.py", line 1071, in _create self.do_handshake() File "D:\Python\lib\ssl.py", line 1342, in do_handshake self._sslobj.do_handshake() ssl.SSLEOFError: EOF occurred in violation of protocol (_ssl.c:997)

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "D:\LORA\kohya_ss\venv\lib\site-packages\requests\adapters.py", line 489, in send resp = conn.urlopen( File "D:\LORA\kohya_ss\venv\lib\site-packages\urllib3\connectionpool.py", line 787, in urlopen retries = retries.increment( File "D:\LORA\kohya_ss\venv\lib\site-packages\urllib3\util\retry.py", line 592, in increment raise MaxRetryError(_pool, url, error or ResponseError(cause)) urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /openai/clip-vit-large-patch14/resolve/main/pytorch_model.bin (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:997)')))

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "D:\LORA\kohya_ss\train_network.py", line 573, in train(args) File "D:\LORA\kohya_ss\train_network.py", line 158, in train textencoder, vae, unet, = train_util.load_target_model(args, weight_dtype) File "D:\LORA\kohya_ss\library\train_util.py", line 1584, in load_target_model text_encoder, vae, unet = model_util.load_models_from_stable_diffusion_checkpoint(args.v2, args.pretrained_model_name_or_path) File "D:\LORA\kohya_ss\library\model_util.py", line 919, in load_models_from_stable_diffusion_checkpoint text_model = CLIPTextModel.from_pretrained("openai/clip-vit-large-patch14") File "D:\LORA\kohya_ss\venv\lib\site-packages\transformers\modeling_utils.py", line 2222, in from_pretrained resolved_archive_file = cached_file( File "D:\LORA\kohya_ss\venv\lib\site-packages\transformers\utils\hub.py", line 409, in cached_file resolved_file = hf_hub_download( File "D:\LORA\kohya_ss\venv\lib\site-packages\huggingface_hub\utils_validators.py", line 124, in _inner_fn return fn(*args, *kwargs) File "D:\LORA\kohya_ss\venv\lib\site-packages\huggingface_hub\file_download.py", line 1105, in hf_hub_download metadata = get_hf_file_metadata( File "D:\LORA\kohya_ss\venv\lib\site-packages\huggingface_hub\utils_validators.py", line 124, in _inner_fn return fn(args, kwargs) File "D:\LORA\kohya_ss\venv\lib\site-packages\huggingface_hub\file_download.py", line 1431, in get_hf_file_metadata r = _request_wrapper( File "D:\LORA\kohya_ss\venv\lib\site-packages\huggingface_hub\file_download.py", line 405, in _request_wrapper response = _request_wrapper( File "D:\LORA\kohya_ss\venv\lib\site-packages\huggingface_hub\file_download.py", line 440, in _request_wrapper return http_backoff( File "D:\LORA\kohya_ss\venv\lib\site-packages\huggingface_hub\utils_http.py", line 129, in http_backoff response = requests.request(method=method, url=url, kwargs) File "D:\LORA\kohya_ss\venv\lib\site-packages\requests\api.py", line 59, in request return session.request(method=method, url=url, kwargs) File "D:\LORA\kohya_ss\venv\lib\site-packages\requests\sessions.py", line 587, in request resp = self.send(prep, send_kwargs) File "D:\LORA\kohya_ss\venv\lib\site-packages\requests\sessions.py", line 701, in send r = adapter.send(request, **kwargs) File "D:\LORA\kohya_ss\venv\lib\site-packages\requests\adapters.py", line 563, in send raise SSLError(e, request=request) requests.exceptions.SSLError: HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /openai/clip-vit-large-patch14/resolve/main/pytorch_model.bin (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:997)'))) Traceback (most recent call last): File "D:\Python\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "D:\Python\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "D:\LORA\kohya_ss\venv\Scripts\accelerate.exe__main__.py", line 7, in File "D:\LORA\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main args.func(args) File "D:\LORA\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command simple_launcher(args) File "D:\LORA\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['D:\LORA\kohya_ss\venv\Scripts\python.exe', 'train_network.py', '--enable_bucket', '--pretrained_model_name_or_path=D:/NovelAI/models/Stable-diffusion/latest.ckpt', '--train_data_dir=E:/Train/Aharen', '--resolution=512,512', '--output_dir=E:/Train/Aharen', '--logging_dir=', '--network_alpha=8', '--save_model_as=safetensors', '--network_module=networks.lora', '--text_encoder_lr=3e-5', '--unet_lr=3e-4', '--network_dim=32', '--output_name=test', '--lr_scheduler_num_cycles=20', '--learning_rate=1e-5', '--lr_scheduler=cosine_with_restarts', '--lr_warmup_steps=567', '--train_batch_size=1', '--max_train_steps=11340', '--save_every_n_epochs=1', '--mixed_precision=fp16', '--save_precision=fp16', '--seed=31337', '--caption_extension=.txt', '--cache_latents', '--clip_skip=2', '--keep_tokens=3', '--bucket_reso_steps=64', '--shuffle_caption', '--xformers', '--use_8bit_adam', '--bucket_no_upscale']' returned non-zero exit status 1.

martianunlimited commented 1 year ago

It looks like libarary/train_util.py is out of sync with https://github.com/kohya-ss/sd-scripts/blob/main/library/train_util.py Updating it seems to allow the training to go a bit further, (will check to see if this is the only out-of-sync dependency)

Rika-Mipa commented 1 year ago

hello, i am glad to see your reply. My friend said he could not train after the update either. Wish you solve the problem. XD