lib/modeling/models/smplgait.py throwing error when training a new dataset

Gait3D / Gait3D-Benchmark

This is the code for the paper "Gait Recognition in the Wild with Dense 3D Representations and A Benchmark. (CVPR 2022)", "Gait Recognition in the Wild with Multi-hop Temporal Switch", and "Parsing is All You Need for Accurate Gait Recognition in the Wild".

133 stars 19 forks source link

lib/modeling/models/smplgait.py throwing error when training a new dataset #9

Closed ThomasNing closed 1 year ago

ThomasNing commented 2 years ago

Hi Jinkai,

When I try to use the SMPLGait to apply on other dataset, during the training process, the smplgait.py throws the error that: smpls = ipts[1][0] # [n, s, d] IndexError: list index out of range It is also interesting that I used 4 GPUs in the training. 3 of them could detect the the ipts[1][0] tensor with size 1. However, the fourth one failed to do so. Could I know how I can solve this?

JinkaiZheng commented 2 years ago

Hi~ Because the framework is based on DDP mode, it is recommended that you use only 1 GPU for debugging. This will make it clear for you to examine your problem.

ThomasNing commented 2 years ago

Could I know how to modify the code to running with 1 GPU?

JinkaiZheng commented 2 years ago

Just like this, change the value of CUDA_VISIBLE_DEVICES and --nproc_per_node: CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --nproc_per_node=1 lib/main.py --cfgs ./config/smplgait_64pixel.yaml --phase train

ThomasNing commented 2 years ago

Thank you! I have tried and the same error appeared. Do you have any guess on why the smpls could not retrieve the tensor information from ipts? Also, I keep meeting the error of :

"/home/zhiyuann/Gait3D-Benchmark/lib/modeling/base_model.py:338: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray. for smpl in smpls_batch]"

Do you think that may contribute to the error?

JinkaiZheng commented 2 years ago

I recommend you start at the source and go step by step to make sure what is causing the missing of smpl data.

ThomasNing commented 2 years ago

Hi Jinkai,

I retrace the error and find that it happens in the base_model.py when running till the pretreating the smpls with the code smpls = [np2var(np.asarray([fra for fra in smpl]), requires_grad=requires_grad).float() for smpl in smpls_batch]

It throws the error that TypeError: can't convert np.ndarray of type numpy.object_. The only supported types are: float64, float32, float16, complex64, complex128, int64, int32, int16, int8, uint8, and bool.

It will throw errors even if I change the dtype to float16 as the trainer_cfg indicates. Do you know what may contribute to that?

JinkaiZheng commented 2 years ago

The "enable_float16" in trainer_cfg aims to memory reduction and speed up. Maybe you can try: smpls = [np2var(np.asarray([fra for fra in smpl]).astype(float), requires_grad=requires_grad).float() for smpl in smpls_batch]