ZhouYuxuanYX / Hyperformer

This is the official implementation of our paper "Hypergraph Transformer for Skeleton-based Action Recognition."
Apache License 2.0
85 stars 14 forks source link

Trainable joint_label #21

Open yochaiye opened 2 weeks ago

yochaiye commented 2 weeks ago

Hi, thank you for your great work.

I want to make joint_label trainable so I can examine the partitions obtained by Hyperformer. To this end, I added the following line to the init method of Model class in Hyperformer.py:

self.joint_label = nn.Parameter(torch.randn(num_point, 3))

Then I modified the forward method of unit_vit so it now looks like this:

def forward(self, x, joint_label, groups):
        ## more efficient implementation
        # label = F.one_hot(torch.tensor(joint_label)).float().to(x.device)
        label = joint_label.softmax(dim=1)
        z = x @ (label / label.sum(dim=0, keepdim=True))

        # w/o proj
        # z = z.permute(3, 0, 1, 2)
        # w/ proj
        z = self.pe_proj(z)
        e = z @ label.T

I'm training the network on the UCLA dataset, but the joint_label.softmax(dim=1) is hardly changing.

What am I getting wrong here?

ZhouYuxuanYX commented 2 weeks ago

Hi,

your implementation seems correct. I would suggest:

1) try to increase the number of predefined groups, as the actual group dimension will be determined by the training (some groups will be assigned no joints at all)

2) try to initialize the label matrix with torch.ones, since torch.randn() tend to generate small values around zero, which could be a reason for the slight change

3) try to train on NTU datasets, and see whether it makes a difference

Please give me your feedback after trying out my suggestions.

yochaiye commented 2 weeks ago

Thank you for the quick response!

I forgot to mention that I propagated the learnable version of joint_label instead of the fixed one so I don't believe this is the source of the problem. I also verified it by checking that the optimizer has a grad value for joint_label (the gradients are very small though, which is what I'm trying to solve).

Will check the options that you've mentioned and see if they resolve the issue.

yochaiye commented 1 week ago

Wanted to update that I've tried your suggestions but unfortunately the problem still persists. Do you recall perhaps how you implemented it?

I double-checked my implementation by taking a fixed joint_label and computing e in two ways:

  1. Your original code
  2. By transforming the joint labels to one-hot vectors so I get the binarised partition matrix H (as it's denoted in the paper) and use my implementation

I got identical values for e using either way.

ZhouYuxuanYX commented 1 week ago

Wanted to update that I've tried your suggestions but unfortunately the problem still persists. Do you recall perhaps how you implemented it?

I double-checked my implementation by taking a fixed joint_label and computing e in two ways:

  1. Your original code
  2. By transforming the joint labels to one-hot vectors so I get the binarised partition matrix H (as it's denoted in the paper) and use my implementation

I got identical values for e using either way.

I still think your implementation should be correct, but unfortunately the exact code I used is now unavailable (the work is done during my internship at the company).

However, I am thinking if there are some gradients, no matter it's small or not, it will lead to a different group label matrix after training. Then you apply softmax on this learned label matrix, won't it give you a partition?

yochaiye commented 1 week ago

It gives me the initialised partition, since the gradients are too small to change it