bcmi / DCI-VTON-Virtual-Try-On

[ACM Multimedia 2023] Taming the Power of Diffusion Models for High-Quality Virtual Try-On with Appearance Flow.
https://arxiv.org/abs/2308.06101
MIT License
387 stars 56 forks source link

Failed to load warp_viton.pth checkpoint #11

Closed gurteshwar closed 11 months ago

gurteshwar commented 1 year ago

Hi,

Thanks for your work!

I am trying to test inference. The pre-trained model at https://drive.google.com/drive/folders/11BJo59iXVu2_NknKMbN0jKtFV06HTn5K fails to load since it's a zip archive with below error


Traceback (most recent call last):
  File "eval_PBAFN_viton.py", line 37, in <module> 
    load_checkpoint(warp_model, opt.warp_checkpoint)
  File "/home/paperspace/PF-AFN/PF-AFN_test/models/networks.py", line 178, in load_checkpoint
    checkpoint = torch.load(checkpoint_path)
  File "/home/paperspace/anaconda3/envs/tryon/lib/python3.6/site-packages/torch/serialization.py", line 387, in load
    return _load(f, map_location, pickle_module, **pickle_load_args)
  File "/home/paperspace/anaconda3/envs/tryon/lib/python3.6/site-packages/torch/serialization.py", line 560, in _load
    raise RuntimeError("{} is a zip archive (did you mean to use torch.jit.load()?)".format(f.name))

If I use the pre-trained checkpoint from https://drive.google.com/file/d/1_a0AiN8Y_d_9TNDhHIcRlERz3zptyYWV/view which is linked to from https://github.com/geyuying/PF-AFN then I get the following error:

Traceback (most recent call last):
  File "eval_PBAFN_viton.py", line 37, in <module> 
    load_checkpoint(warp_model, opt.warp_checkpoint)
  File "/home/paperspace/PF-AFN/PF-AFN_test/models/networks.py", line 183, in load_checkpoint
    model.load_state_dict(checkpoint_new)
  File "/home/paperspace/anaconda3/envs/tryon/lib/python3.6/site-packages/torch/nn/modules/module.py", line 777, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for AFWM:
        size mismatch for cond_features.encoders.0.0.block.0.weight: copying a param with shape torch.Size([3]) from checkpoint, the shape in current model is torch.Size([16]).
        size mismatch for cond_features.encoders.0.0.block.0.bias: copying a param with shape torch.Size([3]) from checkpoint, the shape in current model is torch.Size([16]).
        size mismatch for cond_features.encoders.0.0.block.0.running_mean: copying a param with shape torch.Size([3]) from checkpoint, the shape in current model is torch.Size([16]).
        size mismatch for cond_features.encoders.0.0.block.0.running_var: copying a param with shape torch.Size([3]) from checkpoint, the shape in current model is torch.Size([16]).
        size mismatch for cond_features.encoders.0.0.block.2.weight: copying a param with shape torch.Size([64, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 16, 3, 3]).

How do I go about fixing this?

Deepak2405 commented 12 months ago

I faced the same issue. Its a big file around 5 GB. Probably your PC has not downloaded it fully yet.

gurteshwar commented 12 months ago

Did you manage to get it working? The file at https://drive.google.com/drive/folders/11BJo59iXVu2_NknKMbN0jKtFV06HTn5K is 112 MB archive

gurteshwar commented 11 months ago

hello @Limbor

Do you have any pointers to resolve this?

Thanks

AntonNikishin commented 11 months ago

Same issue

AntonNikishin commented 11 months ago

I'm trying to run this script on the environment described in https://github.com/geyuying/PF-AFN, since it is not possible to run in DCI-VTON environment due to different requirements.

Getting this error:

[admin@jupyter-anton (tryon)] PF-AFN_test $ sh test_VITON.sh
------------ Options -------------
batchSize: 32
data_type: 32
dataroot: ./../../MY-DATA
display_winsize: 512
fineSize: 512
gen_checkpoint: checkpoints/PFAFN/gen_model_final.pth
gpu_ids: [0]
input_nc: 3
isTrain: False
label_nc: 13
loadSize: 512
max_dataset_size: inf
nThreads: 1
name: cloth-warp
no_flip: False
norm: instance
output_nc: 3
phase: test
resize_or_crop: none
serial_batches: False
tf_log: False
unpaired: True
use_dropout: False
verbose: False
warp_checkpoint: checkpoints/warp_viton.pth
-------------- End ----------------
#training images = 1
Traceback (most recent call last):
  File "/User/.conda/envs/tryon/lib/python3.6/tarfile.py", line 189, in nti
    n = int(s.strip() or "0", 8)
ValueError: invalid literal for int() with base 8: 'torch._u'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/User/.conda/envs/tryon/lib/python3.6/tarfile.py", line 2299, in next
    tarinfo = self.tarinfo.fromtarfile(self)
  File "/User/.conda/envs/tryon/lib/python3.6/tarfile.py", line 1093, in fromtarfile
    obj = cls.frombuf(buf, tarfile.encoding, tarfile.errors)
  File "/User/.conda/envs/tryon/lib/python3.6/tarfile.py", line 1035, in frombuf
    chksum = nti(buf[148:156])
  File "/User/.conda/envs/tryon/lib/python3.6/tarfile.py", line 191, in nti
    raise InvalidHeaderError("invalid header")
tarfile.InvalidHeaderError: invalid header

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/User/.conda/envs/tryon/lib/python3.6/site-packages/torch/serialization.py", line 556, in _load
    return legacy_load(f)
  File "/User/.conda/envs/tryon/lib/python3.6/site-packages/torch/serialization.py", line 467, in legacy_load
    with closing(tarfile.open(fileobj=f, mode='r:', format=tarfile.PAX_FORMAT)) as tar, \
  File "/User/.conda/envs/tryon/lib/python3.6/tarfile.py", line 1591, in open
    return func(name, filemode, fileobj, **kwargs)
  File "/User/.conda/envs/tryon/lib/python3.6/tarfile.py", line 1621, in taropen
    return cls(name, mode, fileobj, **kwargs)
  File "/User/.conda/envs/tryon/lib/python3.6/tarfile.py", line 1484, in __init__
    self.firstmember = self.next()
  File "/User/.conda/envs/tryon/lib/python3.6/tarfile.py", line 2311, in next
    raise ReadError(str(e))
tarfile.ReadError: invalid header

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "eval_PBAFN_viton.py", line 37, in <module>
    load_checkpoint(warp_model, opt.warp_checkpoint)
  File "/User/PF-AFN/PF-AFN_test/models/networks.py", line 178, in load_checkpoint
    checkpoint = torch.load(checkpoint_path)
  File "/User/.conda/envs/tryon/lib/python3.6/site-packages/torch/serialization.py", line 387, in load
    return _load(f, map_location, pickle_module, **pickle_load_args)
  File "/User/.conda/envs/tryon/lib/python3.6/site-packages/torch/serialization.py", line 560, in _load
    raise RuntimeError("{} is a zip archive (did you mean to use torch.jit.load()?)".format(f.name))
RuntimeError: checkpoints/warp_viton.pth is a zip archive (did you mean to use torch.jit.load()?)
Limbor commented 11 months ago

Hi @gurteshwar It still runs smoothly on my machine, could you provide more details about the environment? You can try still using dci-vton environment, and an additional package is needed to install.

pip install cupy-cuda11x
AntonNikishin commented 11 months ago

In that case it would through an error module 'cupy' has no attribute 'util'

Traceback (most recent call last):
  File "eval_PBAFN_viton.py", line 11, in <module>
    from models.afwm import AFWM
  File "/User/PF-AFN/PF-AFN_test/models/afwm.py", line 4, in <module>
    from .correlation import correlation
  File "/User/PF-AFN/PF-AFN_test/models/correlation/correlation.py", line 274, in <module>
    @cupy.util.memoize(for_each_device=True)
  File "/User/.conda/envs/dci-vton/lib/python3.8/site-packages/cupy/__init__.py", line 921, in __getattr__
    raise AttributeError(f"module 'cupy' has no attribute {name!r}")
AttributeError: module 'cupy' has no attribute 'util'
Limbor commented 11 months ago

@AntonNikishin just modify the code

@cupy.util.memoize(for_each_device=True)

to

@cupy.memoize(for_each_device=True)
AntonNikishin commented 11 months ago

That helps, thank you