Open yochaiye opened 2 weeks ago
Hi,
your implementation seems correct. I would suggest:
1) try to increase the number of predefined groups, as the actual group dimension will be determined by the training (some groups will be assigned no joints at all)
2) try to initialize the label matrix with torch.ones, since torch.randn() tend to generate small values around zero, which could be a reason for the slight change
3) try to train on NTU datasets, and see whether it makes a difference
Please give me your feedback after trying out my suggestions.
Thank you for the quick response!
I forgot to mention that I propagated the learnable version of joint_label
instead of the fixed one so I don't believe this is the source of the problem. I also verified it by checking that the optimizer has a grad value for joint_label
(the gradients are very small though, which is what I'm trying to solve).
Will check the options that you've mentioned and see if they resolve the issue.
Wanted to update that I've tried your suggestions but unfortunately the problem still persists. Do you recall perhaps how you implemented it?
I double-checked my implementation by taking a fixed joint_label
and computing e
in two ways:
H
(as it's denoted in the paper) and use my implementationI got identical values for e
using either way.
Wanted to update that I've tried your suggestions but unfortunately the problem still persists. Do you recall perhaps how you implemented it?
I double-checked my implementation by taking a fixed
joint_label
and computinge
in two ways:
- Your original code
- By transforming the joint labels to one-hot vectors so I get the binarised partition matrix
H
(as it's denoted in the paper) and use my implementationI got identical values for
e
using either way.
I still think your implementation should be correct, but unfortunately the exact code I used is now unavailable (the work is done during my internship at the company).
However, I am thinking if there are some gradients, no matter it's small or not, it will lead to a different group label matrix after training. Then you apply softmax on this learned label matrix, won't it give you a partition?
It gives me the initialised partition, since the gradients are too small to change it
Hi, thank you for your great work.
I want to make
joint_label
trainable so I can examine the partitions obtained by Hyperformer. To this end, I added the following line to theinit
method ofModel
class inHyperformer.py
:Then I modified the
forward
method ofunit_vit
so it now looks like this:I'm training the network on the UCLA dataset, but the
joint_label.softmax(dim=1)
is hardly changing.What am I getting wrong here?