cloneofsimo / lora

Using Low-rank adaptation to quickly fine-tune diffusion models.
https://arxiv.org/abs/2106.09685
Apache License 2.0
6.94k stars 479 forks source link

urllib3 connection error? #200

Open KyonP opened 1 year ago

KyonP commented 1 year ago

I am trying to finetune the code with my custom dataset (50000 images with text descriptions).

and it stops during the training phase.

It seems it happens whenever the code trying to get the pre-trained model from Huggingface.

How can I avoid this? any suggestions would be grateful.

the following lines are error messages that I got. they are a bit cut off because of the limited scroll bar :(.

socket.gaierror: [Errno -3] Temporary failure in name resolution

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/urllib3/connectionpool.py", line 703, in urlopen
    httplib_response = self._make_request(
  File "/usr/local/lib/python3.8/dist-packages/urllib3/connectionpool.py", line 386, in _make_request
    self._validate_conn(conn)
  File "/usr/local/lib/python3.8/dist-packages/urllib3/connectionpool.py", line 1042, in _validate_conn
    conn.connect()
  File "/usr/local/lib/python3.8/dist-packages/urllib3/connection.py", line 358, in connect
    self.sock = conn = self._new_conn()
  File "/usr/local/lib/python3.8/dist-packages/urllib3/connection.py", line 186, in _new_conn
    raise NewConnectionError(
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x7f7e41ba20a0>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/requests/adapters.py", line 489, in send
    resp = conn.urlopen(
  File "/usr/local/lib/python3.8/dist-packages/urllib3/connectionpool.py", line 787, in urlopen
    retries = retries.increment(
  File "/usr/local/lib/python3.8/dist-packages/urllib3/util/retry.py", line 592, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /api/models/stabilityai/stable-diffusion-2-1-base (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f7e41ba20a0>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "train_lora_dreambooth_mycode.py", line 1644, in <module>
    main(args)
  File "train_lora_dreambooth_mycode.py", line 937, in main
    pipeline = StableDiffusionPipeline.from_pretrained(
  File "/usr/local/lib/python3.8/dist-packages/diffusers/pipelines/pipeline_utils.py", line 530, in from_pretrained
    info = model_info(
  File "/usr/local/lib/python3.8/dist-packages/huggingface_hub/utils/_validators.py", line 124, in _inner_fn
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/huggingface_hub/hf_api.py", line 1228, in model_info
    r = requests.get(
  File "/usr/local/lib/python3.8/dist-packages/requests/api.py", line 73, in get
    return request("get", url, params=params, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/requests/api.py", line 59, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/requests/sessions.py", line 587, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python3.8/dist-packages/requests/sessions.py", line 701, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/requests/adapters.py", line 565, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /api/models/stabilityai/stable-diffusion-2-1-base (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f7e41ba20a0>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution'))
Steps:  21%|███████████████████████████████████▊                                                                                                                                          | 105000/509550 [9:50:37<37:55:35,  2.96it/s, loss=0.0773, lr=0.0001]
Traceback (most recent call last):
  File "/usr/local/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/accelerate_cli.py", line 45, in main
    args.func(args)
  File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/launch.py", line 1097, in launch_command
    simple_launcher(args)
  File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/launch.py", line 552, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', 'train_lora_dreambooth_mycode.py', '--pretrained_model_name_or_path=stabilityai/stable-diffusion-2-1-base', '--instance_data_dir=../../datasets/my_data', '--output_dir=./output_example_text_v0.9', '--instance_prompt=deprecated, using captions', '--train_text_encoder', '--resolution=512', '--train_batch_size=1', '--gradient_accumulation_steps=1', '--learning_rate=1e-4', '--learning_rate_text=5e-5', '--color_jitter', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--num_train_epochs=10', '--save_steps=5000']' returned non-zero exit status 1.