3huo / 3DV-Action

Action Recognition CVPR2020
63 stars 3 forks source link

Can you provide some description on how to use this code? #2

Closed liujiaheng closed 4 years ago

liujiaheng commented 4 years ago

like how to generate point cloud datasets and the training or validation process

liujiaheng commented 4 years ago

I download the Masked Depth Maps from NTU-RGBD dataset. I cannot find where to generate action proposals in your code.

Thanks

3huo commented 4 years ago

This code consists of 2 steps. Firstly, 3DV point data should be generated by the construction code (e.g. "ntu120_3dv_pre.py"). Then the 3DV points data can be feed to the network.

liujiaheng commented 4 years ago

Thanks for your description. I also some problems in ''ntu120_3dv_pre.py" 1. image

In line 121. I think this line should change to if m == 0: voxel_DI[0,:,:,:] = voxel_DI[0,:,:,:] + (i_frame*2-n_frame+1)*threeD_matrix 2. image In the normalization process, In Line 166, why does the nturgbd dataset choose to divide 'y_len'. I read the code of 'main_uwa.m' and 'main_ucla_fulldepth.m', I find these two datasets choose to divide 'x_len'. Or this is not important for the result?

Thanks. Looking forward to your reply.

liujiaheng commented 4 years ago

Besides, I found you used 'sum_i (i2-N+1)V_i' in processing nturgbd dataset, but used 'sum_i (i2-N-1)V_i'' in other datasets(I see this in your matlab version).

3huo commented 4 years ago

Besides, I found you used 'sum_i (i2-N+1)_V_i' in processing nturgbd dataset, but used 'sum_i (i_2-N-1)V_i'' in other datasets(I see this in your matlab version).

it is the reason that Matlab array list begin with the index 1, while python 0.

3huo commented 4 years ago

Thanks for your description. I also some problems in ''ntu120_3dv_pre.py" 1. image

In line 121. I think this line should change to if m == 0: voxel_DI[0,:,:,:] = voxel_DI[0,:,:,:] + (i_frame*2-n_frame+1)*threeD_matrix 2. image In the normalization process, In Line 166, why does the nturgbd dataset choose to divide 'y_len'. I read the code of 'main_uwa.m' and 'main_ucla_fulldepth.m', I find these two datasets choose to divide 'x_len'. Or this is not important for the result?

Thanks. Looking forward to your reply.

Thanks for your description. I also some problems in ''ntu120_3dv_pre.py" 1. image

In line 121. I think this line should change to if m == 0: voxel_DI[0,:,:,:] = voxel_DI[0,:,:,:] + (i_frame*2-n_frame+1)*threeD_matrix 2. image In the normalization process, In Line 166, why does the nturgbd dataset choose to divide 'y_len'. I read the code of 'main_uwa.m' and 'main_ucla_fulldepth.m', I find these two datasets choose to divide 'x_len'. Or this is not important for the result?

Thanks. Looking forward to your reply.

Thanks for your comments, and I have revist the code once again. there are indeed some errors when I was pulishing this code. the line 121 should be changed to "if m == 0: voxel_DI[0,:,:,:] = voxel_DI[0,:,:,:] + (i_frame2-n_frame+1)threeD_matrix".

"y_len" or "x_len" is not important for the result. I recommend “y_len” due to the consistance of human height.

liujiaheng commented 4 years ago

Thanks for your kind reply. In NTU_Net/dataset/dataset.py, Line 89, 'v_name = vid_name[:-9]'. I think this line should change to ''v_name = vid_name[:-4]'' And in Line 53, I think the end of the ntu60 dataset is 'S017C003P020R002A060.npy' Besides, I run the 'train.py' on the dataset of NTU-rgbd-60. I only change the '--Num_class' to 60 and '--dataset' to 'ntu60' in 'train.py'. The remaining settings follow the same with your 'train.py' code. I obtained the result "93.7%" on the cross-view setting. On your paper, I see the result is 96.3%. According to your experience, what's the problem with this result?

Thanks. Looking for your kind reply.

3huo commented 4 years ago

Thanks for your kind reply. In NTU_Net/dataset/dataset.py, Line 89, 'v_name = vid_name[:-9]'. I think this line should change to ''v_name = vid_name[:-4]'' And in Line 53, I think the end of the ntu60 dataset is 'S017C003P020R002A060.npy' Besides, I run the 'train.py' on the dataset of NTU-rgbd-60. I only change the '--Num_class' to 60 and '--dataset' to 'ntu60' in 'train.py'. The remaining settings follow the same with your 'train.py' code. I obtained the result "93.7%" on the cross-view setting. On your paper, I see the result is 96.3%. According to your experience, what's the problem with this result?

Thanks. Looking for your kind reply.

Thanks for your kindly comments the "vid_name[:-9]" seting is based on our matlab version 3DV generation. In fact, we use the matlab version to generate 3DV points in practice. The python version is only used to compute the final time efficience. so we also recommend to use the matlab version.

For the reuslt problem, are you sure the parameters are optimal, such as voxel size, batch size...
There may be another reason that the uploaded code is target on the NTU120 setting. And I will try to update a NTU60 version soon.