AILab-CVC / SEED

Official implementation of SEED-LLaMA (ICLR 2024).
https://ailab-cvc.github.io/seed
Other
515 stars 30 forks source link

Difference in 'blocks' and 'blocks_for_image' #28

Open zheedong opened 3 months ago

zheedong commented 3 months ago

Hi, in tokenizer training, you apply blocks for reconstruction causal embedding, and apply blocks_for_image (in 'blip2_qformer_codebook_all_image.py'). But you apply only blocks in get_codebook_indicies (in 'qformer_quantizer.py'). Why is it difference here?

geyuying commented 3 months ago

blocks is used to reconstruct causal embedding, and is only used during training to serve as a training objective, while blocks_for_image is used to reconsturct the clip image features, which can be decoded into realistic images with SD U-Net.

In get_codebook_indicies (in 'qformer_quantizer.py'), we only apply blocks_for_image to obtain the reconstructed image features, so that discrete tokens can be decoded into images.

zheedong commented 3 months ago

Why 'blocks' is needed? Why don't you unify 'blocks' and 'blocks_for_image'? Or why not reconstruct Causal Embedding through blocks, then apply blocks_for_image?