VIPL-SLP / pointlstm-gesture-recognition-pytorch

This repo holds the codes of paper: An Efficient PointLSTM for Point Clouds Based Gesture Recognition (CVPR 2020).
https://openaccess.thecvf.com/content_CVPR_2020/html/Min_An_Efficient_PointLSTM_for_Point_Clouds_Based_Gesture_Recognition_CVPR_2020_paper.html
Apache License 2.0
117 stars 19 forks source link

A question about group_points in op.py #11

Closed LJL36 closed 3 years ago

LJL36 commented 3 years ago

Hi @Blueprintf Thanks for sharing your great work! I have a question about the group_points function in experiments/models/op.py

def group_points(self, distance_dim, array1, array2, knn, dim):
        matrix, a1, a2 = self.array_distance(array1, array2, distance_dim, dim)
        dists, inputs_idx = torch.topk(matrix,
                                       knn,
                                       -1,
                                       largest=False,
                                       sorted=True)
        neighbor = a2.gather(
            -1,
            inputs_idx.unsqueeze(1).expand(dists.shape[:1] + (a2.shape[1], ) +
                                           dists.shape[1:]))
        offsets = array1.unsqueeze(dim + 1) - neighbor
        offsets[:, :3] /= torch.sum(offsets[:, :3]**2,
                                    dim=1).unsqueeze(1)**0.5 + 1e-8
        return offsets

The dim for timestamp is also involved in the calculation of the offsets. So the corresponding value is always 0. Is it which you expected?

ycmin95 commented 3 years ago

@LJL36, Yes, the timestamp offset in group_points() is always 0 and doesn't affect the group results. As we introduced in the paper (Figure. 3), we extract the intra-frame structure in the stage-1 to reduce the temporal temporal impacts from other frames in the early stage of the feature extraction.

LJL36 commented 3 years ago

@LJL36, Yes, the timestamp offset in group_points() is always 0 and doesn't affect the group results. As we introduced in the paper (Figure. 3), we extract the intra-frame structure in the stage-1 to reduce the temporal temporal impacts from other frames in the early stage of the feature extraction.

In other words, the dimension of timestamp is redundant because this dimension is all set to zero in stage-1 at the beginning. In fact, the information about the relative position between frames is extracted through the inter-frame stage and the lstm module?

ycmin95 commented 3 years ago

@LJL36 Yes, we heuritically desigh the baseline network ( like 2D conv + LSTM) based on previous work (BMVC'19) without detailed ablation studies about the basic module design.

LJL36 commented 3 years ago

Okay, I see, thanks very much for your patience.