Error in model = DIORModel(opt)

preetshah7 commented 2 years ago

Hi Aiyu Cui, I have been following the topic since 2017 and dressing-in-order brings a lot of new features like tuck-in into the picture. Cheers for that. However, I have tried to recreate this framework on Google Colab & not been able to figure my way out. The notebook that I've used: link_to_nb

Tesla K80 NVIDIA-SMI 510.39.01 Driver Version: 460.32.03 CUDA Version: 11.2

While building custom CUDA modules was smooth, I am not sure about CUDA 11.2 along with torch 1.0.0

When setting up the dior_mdoel, the below error pops.

load vgg ckpt from torchvision dict.
[init] init pre-trained model vgg.
initialize network with orthogonal

---------------------------------------------------------------------------

RuntimeError                              Traceback (most recent call last)

<ipython-input-14-81abdc6faa32> in <module>
     29 
     30 # create model
---> 31 model = DIORModel(opt)
     32 model.setup(opt)

14 frames

/content/dressing-in-order/models/dior_model.py in __init__(self, opt)
      9 class DIORModel(DIORBaseModel):
     10     def __init__(self, opt):
---> 11         DIORBaseModel.__init__(self, opt)
     12         self.netE_opt = opt.netE
     13         self.frozen_flownet = opt.frozen_flownet

/content/dressing-in-order/models/dior_base_model.py in __init__(self, opt)
     21         self.n_style_blocks = opt.n_style_blocks
     22         # init_models
---> 23         self._init_models(opt)
     24 
     25         # loss

/content/dressing-in-order/models/dior_model.py in _init_models(self, opt)
     59 
     60     def _init_models(self, opt):
---> 61         super()._init_models(opt)
     62         self.model_names += ["Flow"]
     63         if opt.frozen_flownet:

/content/dressing-in-order/models/dior_base_model.py in _init_models(self, opt)
     72                                       n_style_blocks=opt.n_style_blocks, n_human_parts=opt.n_human_parts, netG=opt.netG,
     73                                       norm=opt.norm_type, relu_type=opt.relu_type,
---> 74                                       init_type=opt.init_type, init_gain=opt.init_gain, gpu_ids=self.gpu_ids)
     75 
     76         self.netE_attr = networks.define_E(input_nc=3, output_nc=opt.style_nc, netE=opt.netE, ngf=opt.ngf, n_downsample=2,

/content/dressing-in-order/models/networks/__init__.py in define_G(input_nc, output_nc, ngf, latent_nc, style_nc, n_downsampling, n_style_blocks, n_human_parts, netG, norm, relu_type, init_type, init_gain, gpu_ids, **kwargs)
     82             norm_type=norm, relu_type=relu_type, **kwargs
     83             )
---> 84     return init_net(net, init_type, init_gain, gpu_ids)
     85 
     86 def define_D(input_nc, ndf, netD, n_layers_D=3, norm='batch', use_dropout=True, use_sigmoid=False, init_type='normal', init_gain=0.02, gpu_ids=[]):

/content/dressing-in-order/models/networks/base_networks.py in init_net(net, init_type, init_gain, gpu_ids, do_init_weight)
    107         net = torch.nn.DataParallel(net, gpu_ids)  # multi-GPUs
    108     if do_init_weight:
--> 109         init_weights(net, init_type, init_gain=init_gain)
    110     return net
    111 

/content/dressing-in-order/models/networks/base_networks.py in init_weights(net, init_type, init_gain)
     88 
     89     print('initialize network with %s' % init_type)
---> 90     net.apply(init_func)  # apply the initialization function <init_func>
     91 
     92 def init_net(net, init_type='normal', init_gain=0.02, gpu_ids=[], do_init_weight=True):

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in apply(self, fn)
    240         """
    241         for module in self.children():
--> 242             module.apply(fn)
    243         fn(self)
    244         return self

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in apply(self, fn)
    240         """
    241         for module in self.children():
--> 242             module.apply(fn)
    243         fn(self)
    244         return self

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in apply(self, fn)
    240         """
    241         for module in self.children():
--> 242             module.apply(fn)
    243         fn(self)
    244         return self

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in apply(self, fn)
    240         """
    241         for module in self.children():
--> 242             module.apply(fn)
    243         fn(self)
    244         return self

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in apply(self, fn)
    240         """
    241         for module in self.children():
--> 242             module.apply(fn)
    243         fn(self)
    244         return self

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in apply(self, fn)
    241         for module in self.children():
    242             module.apply(fn)
--> 243         fn(self)
    244         return self
    245 

/content/dressing-in-order/models/networks/base_networks.py in init_func(m)
     78                 init.kaiming_normal_(m.weight.data, a=0, mode='fan_in')
     79             elif init_type == 'orthogonal':
---> 80                 init.orthogonal_(m.weight.data, gain=init_gain)
     81             else:
     82                 raise NotImplementedError('initialization method [%s] is not implemented' % init_type)

/usr/local/lib/python3.7/dist-packages/torch/nn/init.py in orthogonal_(tensor, gain)
    354 
    355     # Compute the qr factorization
--> 356     q, r = torch.qr(flattened)
    357     # Make Q uniform according to https://arxiv.org/pdf/math-ph/0609050.pdf
    358     d = torch.diag(r, 0)

RuntimeError: cuda runtime error (11) : invalid argument at /pytorch/aten/src/THC/generic/THCTensorMathPairwise.cu:225

link_to_cell Please look into this, Thanks :)

cuiaiyu commented 2 years ago

Maybe first check if the cudatoolkit is installed in correct version, something should be like conda install pytorch=1.0.0 torchvision cudatoolkit=11.0 -c pytorch

Besides, if you only want to run Demo (so no training), you can use higher version of pytorch, which should make the compiling easier.

Thanks.

preetshah7 commented 2 years ago

I did try the above-mentioned conda install with no luck. conda install pytorch=1.0.0 torchvision cudatoolkit=11.0 -c pytorch Since I just want to test inference, I tried with the colab pre-installed torch 1.10 and it couldn't build the custom CUDA modules mentioned in GFLA. Note that with torch 1.0 that was happening. They've mentioned this

The Colab Demo for the Global-Flow-Local-Attention Model. Note: we suggest to use GPUs with SM architecture higher than "SM60", such as P100, P4. Bugs are found when running with GPUs: K80 (We would really appreciate if you can offer any help) . Therefore, if you got GPUs listed above, please try to reset your runtime and get a different GPU.

Colab is giving me K80 always and here are the gencodes in setting up block_extractor, local_attn_reshape & resample2d_package

nvcc_args = [
    #'-gencode', 'arch=compute_50,code=sm_50',
    #'-gencode', 'arch=compute_52,code=sm_52',
    '-gencode', 'arch=compute_60,code=sm_60',
    '-gencode', 'arch=compute_61,code=sm_61',
    '-gencode', 'arch=compute_70,code=sm_70',
    '-gencode', 'arch=compute_70,code=compute_70'
]

Please suggest me a workaround for this if it's possible and let me know if it's possible on Colab

cuiaiyu commented 2 years ago

If you only need to inference, you can bypass the installation of GLFA's CUDA function. specifying --frozen_flownet will bypass all CUDA function calls.

preetshah7 commented 2 years ago

Thanks for the response, I am onto trying that

preetshah7 commented 2 years ago

Screenshot from 2022-01-21 03-26-56 Have I passed it correctly here?

preetshah7 commented 2 years ago

Since the CUDA modules won't build, the flownet doesn't exist

cuiaiyu commented 2 years ago

flownet.pt is the weight of pertrained flow model. please check Issue #23 at https://github.com/cuiaiyu/dressing-in-order/issues/23

In short, you don't need it, you can specify it as ```opt.flownet_path = ''````

preetshah7 commented 2 years ago

Thanks a lot for the help and yes the results are amazing. All the Best!

nikky4D commented 2 years ago

flownet.pt is the weight of pertrained flow model. please check Issue #23 at #23

In short, you don't need it, you can specify it as ```opt.flownet_path = ''````

Is flownet required in the demo? In the demo, you specify opt.flownet_path = pretrained_models/flownet.pt

cuiaiyu commented 2 years ago

No it is not required, you can specify it as empty array.

mahachaaben99 commented 2 years ago

Heey @preetshah7 did you solve the problem? I got the same error while trying the demo and I couldn't fix it

preetshah7 commented 2 years ago

Hi @maziqueen79 @nikky4D so, as suggested by the owner, I did not provide the flownet to the model. notebook url opt.flownet_path = '' This worked for me.

mahachaaben99 commented 2 years ago

thank you for your help @preetshah7

MAmmarRaza commented 12 months ago

Hi! respected researchers i am trying to run this demo but i am not showing here images with pose as like your output was showing before? Screenshot from 2023-09-10 20-48-35

raghavendra-me commented 2 months ago

Hi Aiyu Cui, I have been following the topic since 2017 and dressing-in-order brings a lot of new features like tuck-in into the picture. Cheers for that. However, I have tried to recreate this framework on Google Colab & not been able to figure my way out. The notebook that I've used: link_to_nb

Tesla K80 NVIDIA-SMI 510.39.01 Driver Version: 460.32.03 CUDA Version: 11.2

While building custom CUDA modules was smooth, I am not sure about CUDA 11.2 along with torch 1.0.0

When setting up the dior_mdoel, the below error pops.

load vgg ckpt from torchvision dict.
[init] init pre-trained model vgg.
initialize network with orthogonal

---------------------------------------------------------------------------

RuntimeError                              Traceback (most recent call last)

<ipython-input-14-81abdc6faa32> in <module>
     29 
     30 # create model
---> 31 model = DIORModel(opt)
     32 model.setup(opt)

14 frames

/content/dressing-in-order/models/dior_model.py in __init__(self, opt)
      9 class DIORModel(DIORBaseModel):
     10     def __init__(self, opt):
---> 11         DIORBaseModel.__init__(self, opt)
     12         self.netE_opt = opt.netE
     13         self.frozen_flownet = opt.frozen_flownet

/content/dressing-in-order/models/dior_base_model.py in __init__(self, opt)
     21         self.n_style_blocks = opt.n_style_blocks
     22         # init_models
---> 23         self._init_models(opt)
     24 
     25         # loss

/content/dressing-in-order/models/dior_model.py in _init_models(self, opt)
     59 
     60     def _init_models(self, opt):
---> 61         super()._init_models(opt)
     62         self.model_names += ["Flow"]
     63         if opt.frozen_flownet:

/content/dressing-in-order/models/dior_base_model.py in _init_models(self, opt)
     72                                       n_style_blocks=opt.n_style_blocks, n_human_parts=opt.n_human_parts, netG=opt.netG,
     73                                       norm=opt.norm_type, relu_type=opt.relu_type,
---> 74                                       init_type=opt.init_type, init_gain=opt.init_gain, gpu_ids=self.gpu_ids)
     75 
     76         self.netE_attr = networks.define_E(input_nc=3, output_nc=opt.style_nc, netE=opt.netE, ngf=opt.ngf, n_downsample=2,

/content/dressing-in-order/models/networks/__init__.py in define_G(input_nc, output_nc, ngf, latent_nc, style_nc, n_downsampling, n_style_blocks, n_human_parts, netG, norm, relu_type, init_type, init_gain, gpu_ids, **kwargs)
     82             norm_type=norm, relu_type=relu_type, **kwargs
     83             )
---> 84     return init_net(net, init_type, init_gain, gpu_ids)
     85 
     86 def define_D(input_nc, ndf, netD, n_layers_D=3, norm='batch', use_dropout=True, use_sigmoid=False, init_type='normal', init_gain=0.02, gpu_ids=[]):

/content/dressing-in-order/models/networks/base_networks.py in init_net(net, init_type, init_gain, gpu_ids, do_init_weight)
    107         net = torch.nn.DataParallel(net, gpu_ids)  # multi-GPUs
    108     if do_init_weight:
--> 109         init_weights(net, init_type, init_gain=init_gain)
    110     return net
    111 

/content/dressing-in-order/models/networks/base_networks.py in init_weights(net, init_type, init_gain)
     88 
     89     print('initialize network with %s' % init_type)
---> 90     net.apply(init_func)  # apply the initialization function <init_func>
     91 
     92 def init_net(net, init_type='normal', init_gain=0.02, gpu_ids=[], do_init_weight=True):

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in apply(self, fn)
    240         """
    241         for module in self.children():
--> 242             module.apply(fn)
    243         fn(self)
    244         return self

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in apply(self, fn)
    240         """
    241         for module in self.children():
--> 242             module.apply(fn)
    243         fn(self)
    244         return self

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in apply(self, fn)
    240         """
    241         for module in self.children():
--> 242             module.apply(fn)
    243         fn(self)
    244         return self

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in apply(self, fn)
    240         """
    241         for module in self.children():
--> 242             module.apply(fn)
    243         fn(self)
    244         return self

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in apply(self, fn)
    240         """
    241         for module in self.children():
--> 242             module.apply(fn)
    243         fn(self)
    244         return self

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in apply(self, fn)
    241         for module in self.children():
    242             module.apply(fn)
--> 243         fn(self)
    244         return self
    245 

/content/dressing-in-order/models/networks/base_networks.py in init_func(m)
     78                 init.kaiming_normal_(m.weight.data, a=0, mode='fan_in')
     79             elif init_type == 'orthogonal':
---> 80                 init.orthogonal_(m.weight.data, gain=init_gain)
     81             else:
     82                 raise NotImplementedError('initialization method [%s] is not implemented' % init_type)

/usr/local/lib/python3.7/dist-packages/torch/nn/init.py in orthogonal_(tensor, gain)
    354 
    355     # Compute the qr factorization
--> 356     q, r = torch.qr(flattened)
    357     # Make Q uniform according to https://arxiv.org/pdf/math-ph/0609050.pdf
    358     d = torch.diag(r, 0)

RuntimeError: cuda runtime error (11) : invalid argument at /pytorch/aten/src/THC/generic/THCTensorMathPairwise.cu:225

link_to_cell Please look into this, Thanks :)

hey can you please share that collab notebook in which you solved this error, i have trouble fixing this, it would be helpful for me.

cuiaiyu / dressing-in-order

Error in model = DIORModel(opt) #27