Failed to load warp_viton.pth checkpoint

gurteshwar commented 1 year ago

Hi,

Thanks for your work!

I am trying to test inference. The pre-trained model at https://drive.google.com/drive/folders/11BJo59iXVu2_NknKMbN0jKtFV06HTn5K fails to load since it's a zip archive with below error


Traceback (most recent call last):
  File "eval_PBAFN_viton.py", line 37, in <module> 
    load_checkpoint(warp_model, opt.warp_checkpoint)
  File "/home/paperspace/PF-AFN/PF-AFN_test/models/networks.py", line 178, in load_checkpoint
    checkpoint = torch.load(checkpoint_path)
  File "/home/paperspace/anaconda3/envs/tryon/lib/python3.6/site-packages/torch/serialization.py", line 387, in load
    return _load(f, map_location, pickle_module, **pickle_load_args)
  File "/home/paperspace/anaconda3/envs/tryon/lib/python3.6/site-packages/torch/serialization.py", line 560, in _load
    raise RuntimeError("{} is a zip archive (did you mean to use torch.jit.load()?)".format(f.name))

If I use the pre-trained checkpoint from https://drive.google.com/file/d/1_a0AiN8Y_d_9TNDhHIcRlERz3zptyYWV/view which is linked to from https://github.com/geyuying/PF-AFN then I get the following error:

Traceback (most recent call last):
  File "eval_PBAFN_viton.py", line 37, in <module> 
    load_checkpoint(warp_model, opt.warp_checkpoint)
  File "/home/paperspace/PF-AFN/PF-AFN_test/models/networks.py", line 183, in load_checkpoint
    model.load_state_dict(checkpoint_new)
  File "/home/paperspace/anaconda3/envs/tryon/lib/python3.6/site-packages/torch/nn/modules/module.py", line 777, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for AFWM:
        size mismatch for cond_features.encoders.0.0.block.0.weight: copying a param with shape torch.Size([3]) from checkpoint, the shape in current model is torch.Size([16]).
        size mismatch for cond_features.encoders.0.0.block.0.bias: copying a param with shape torch.Size([3]) from checkpoint, the shape in current model is torch.Size([16]).
        size mismatch for cond_features.encoders.0.0.block.0.running_mean: copying a param with shape torch.Size([3]) from checkpoint, the shape in current model is torch.Size([16]).
        size mismatch for cond_features.encoders.0.0.block.0.running_var: copying a param with shape torch.Size([3]) from checkpoint, the shape in current model is torch.Size([16]).
        size mismatch for cond_features.encoders.0.0.block.2.weight: copying a param with shape torch.Size([64, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 16, 3, 3]).

How do I go about fixing this?

Deepak2405 commented 1 year ago

I faced the same issue. Its a big file around 5 GB. Probably your PC has not downloaded it fully yet.

gurteshwar commented 1 year ago

Did you manage to get it working? The file at https://drive.google.com/drive/folders/11BJo59iXVu2_NknKMbN0jKtFV06HTn5K is 112 MB archive

gurteshwar commented 1 year ago

hello @Limbor

Do you have any pointers to resolve this?

Thanks

AntonNikishin commented 1 year ago

Same issue

AntonNikishin commented 1 year ago

I'm trying to run this script on the environment described in https://github.com/geyuying/PF-AFN, since it is not possible to run in DCI-VTON environment due to different requirements.

Getting this error:

[admin@jupyter-anton (tryon)] PF-AFN_test $ sh test_VITON.sh
------------ Options -------------
batchSize: 32
data_type: 32
dataroot: ./../../MY-DATA
display_winsize: 512
fineSize: 512
gen_checkpoint: checkpoints/PFAFN/gen_model_final.pth
gpu_ids: [0]
input_nc: 3
isTrain: False
label_nc: 13
loadSize: 512
max_dataset_size: inf
nThreads: 1
name: cloth-warp
no_flip: False
norm: instance
output_nc: 3
phase: test
resize_or_crop: none
serial_batches: False
tf_log: False
unpaired: True
use_dropout: False
verbose: False
warp_checkpoint: checkpoints/warp_viton.pth
-------------- End ----------------
#training images = 1
Traceback (most recent call last):
  File "/User/.conda/envs/tryon/lib/python3.6/tarfile.py", line 189, in nti
    n = int(s.strip() or "0", 8)
ValueError: invalid literal for int() with base 8: 'torch._u'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/User/.conda/envs/tryon/lib/python3.6/tarfile.py", line 2299, in next
    tarinfo = self.tarinfo.fromtarfile(self)
  File "/User/.conda/envs/tryon/lib/python3.6/tarfile.py", line 1093, in fromtarfile
    obj = cls.frombuf(buf, tarfile.encoding, tarfile.errors)
  File "/User/.conda/envs/tryon/lib/python3.6/tarfile.py", line 1035, in frombuf
    chksum = nti(buf[148:156])
  File "/User/.conda/envs/tryon/lib/python3.6/tarfile.py", line 191, in nti
    raise InvalidHeaderError("invalid header")
tarfile.InvalidHeaderError: invalid header

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/User/.conda/envs/tryon/lib/python3.6/site-packages/torch/serialization.py", line 556, in _load
    return legacy_load(f)
  File "/User/.conda/envs/tryon/lib/python3.6/site-packages/torch/serialization.py", line 467, in legacy_load
    with closing(tarfile.open(fileobj=f, mode='r:', format=tarfile.PAX_FORMAT)) as tar, \
  File "/User/.conda/envs/tryon/lib/python3.6/tarfile.py", line 1591, in open
    return func(name, filemode, fileobj, **kwargs)
  File "/User/.conda/envs/tryon/lib/python3.6/tarfile.py", line 1621, in taropen
    return cls(name, mode, fileobj, **kwargs)
  File "/User/.conda/envs/tryon/lib/python3.6/tarfile.py", line 1484, in __init__
    self.firstmember = self.next()
  File "/User/.conda/envs/tryon/lib/python3.6/tarfile.py", line 2311, in next
    raise ReadError(str(e))
tarfile.ReadError: invalid header

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "eval_PBAFN_viton.py", line 37, in <module>
    load_checkpoint(warp_model, opt.warp_checkpoint)
  File "/User/PF-AFN/PF-AFN_test/models/networks.py", line 178, in load_checkpoint
    checkpoint = torch.load(checkpoint_path)
  File "/User/.conda/envs/tryon/lib/python3.6/site-packages/torch/serialization.py", line 387, in load
    return _load(f, map_location, pickle_module, **pickle_load_args)
  File "/User/.conda/envs/tryon/lib/python3.6/site-packages/torch/serialization.py", line 560, in _load
    raise RuntimeError("{} is a zip archive (did you mean to use torch.jit.load()?)".format(f.name))
RuntimeError: checkpoints/warp_viton.pth is a zip archive (did you mean to use torch.jit.load()?)

Limbor commented 1 year ago

Hi @gurteshwar It still runs smoothly on my machine, could you provide more details about the environment? You can try still using dci-vton environment, and an additional package is needed to install.

pip install cupy-cuda11x

AntonNikishin commented 1 year ago

In that case it would through an error module 'cupy' has no attribute 'util'

Traceback (most recent call last):
  File "eval_PBAFN_viton.py", line 11, in <module>
    from models.afwm import AFWM
  File "/User/PF-AFN/PF-AFN_test/models/afwm.py", line 4, in <module>
    from .correlation import correlation
  File "/User/PF-AFN/PF-AFN_test/models/correlation/correlation.py", line 274, in <module>
    @cupy.util.memoize(for_each_device=True)
  File "/User/.conda/envs/dci-vton/lib/python3.8/site-packages/cupy/__init__.py", line 921, in __getattr__
    raise AttributeError(f"module 'cupy' has no attribute {name!r}")
AttributeError: module 'cupy' has no attribute 'util'

Limbor commented 1 year ago

@AntonNikishin just modify the code

@cupy.util.memoize(for_each_device=True)

to

@cupy.memoize(for_each_device=True)

AntonNikishin commented 1 year ago

That helps, thank you

bcmi / DCI-VTON-Virtual-Try-On

Failed to load warp_viton.pth checkpoint #11