Gengzigang / PCT

This is an official implementation of our CVPR 2023 paper "Human Pose as Compositional Tokens" (https://arxiv.org/pdf/2303.11638.pdf)
MIT License
306 stars 18 forks source link

Tokens to be integers #38

Open Tommy-Hsu opened 2 months ago

Tommy-Hsu commented 2 months ago

Hello, I would like to ask about the meaning of tokens being integers. I noticed that the final forward pass to the tokenizer involves the cls_logits_softmax tensor, and it directly performs a matrix multiplication with the codebook. However, these operations are all in floating-point. So, what does it mean for tokens to be integers in classifier stage?

ndsl555 commented 2 months ago

I'd like to know too

Kun-Ming commented 1 month ago

Seems like the integer token only happens when stage I training. I think is the variable encoding_indices is this line: https://github.com/Gengzigang/PCT/blob/main/models/pct_tokenizer.py#L142

Tommy-Hsu commented 1 month ago

Seems like the integer token only happens when stage I training. I think is the variable encoding_indices is this line: https://github.com/Gengzigang/PCT/blob/main/models/pct_tokenizer.py#L142

That's true. In stage 1, the encoding_indices would be integers but not in stage 2.

Tommy-Hsu commented 1 month ago

image

Figure 1 is quite confusing to me. In the reference stage, the class head output should be logits, and the codebook context is composed of floating-point data. However, this image shows that both are composed of integers, which is what I find puzzling.