bytedance / 1d-tokenizer

This repo contains the code for 1D tokenizer and generator
Apache License 2.0
570 stars 24 forks source link

Request for linear probing code #29

Open shashankvkt opened 2 months ago

shashankvkt commented 2 months ago

Hello,

Thank you for this wonderful work. I was wondering if its possible to maybe share the code for linear probing to reproduce Figure 4(b). I was trying to reproduce it but did not get the desired results. Just to ensure I didn't do a mistake, could you please share the implementation?

Thanks

cornettoyu commented 2 months ago

Hi, please refer to MAE for code-base & instructions. We following everything in MAE, except that we replace the ViT-Encoder with TiTok-Encoder and uses global pooling to obtain the embedding for linear probing.

shashankvkt commented 2 months ago

Thanks for your reply. Will try it out.

I assumed that you might be using the discrete tokens directly as input to the linear layer for linear probing. So basically, for linear probing, you do not use any discrete tokens?

SilentView commented 2 months ago

Thanks for your reply. Will try it out.

I assumed that you might be using the discrete tokens directly as input to the linear layer for linear probing. So basically, for linear probing, you do not use any discrete tokens?

I am also trying to reproduce the Linear Probe results of TiTok. I would assume the 12-dimensional features before being quantized were used for global average pooling, according to the reply:

we replace the ViT-Encoder with TiTok-Encoder

If this is true, I would say this is interesting, because in my previous linear probe experiments on another tokenizer, low dimensional features' results are 10x worse than using high dimensional features before being down-scaled to the dimension of the codebook.

SilentView commented 2 months ago

Thanks for your reply. Will try it out. I assumed that you might be using the discrete tokens directly as input to the linear layer for linear probing. So basically, for linear probing, you do not use any discrete tokens?

I am also trying to reproduce the Linear Probe results of TiTok. I would assume the 12-dimensional features before being quantized were used for global average pooling, according to the reply:

we replace the ViT-Encoder with TiTok-Encoder

If this is true, I would say this is interesting, because in my previous linear probe experiments on another tokenizer, low dimensional features' results are 10x worse than using high dimensional features before being down-scaled to the dimension of the codebook.

update: high-dimensional features are necessary for linear probe.