Wonderful work and amazing performance!As I delve into the subject matter, I have encountered the "Planting a SEED of Vision in Large Language Model" paper, which discusses a similar topic and aims to address comparable research questions.
To gain a better understanding and comprehensive insight into both works, I would greatly appreciate it if you could provide me with some clarification regarding the distinctions and advantages of your work in comparison to SEED
For the discussion regarding similar works working on CLIP feats including SEED, QFormer, EMU etc, please refer to our paper's related work section, Tokenization for Image Understanding part.
Wonderful work and amazing performance!As I delve into the subject matter, I have encountered the "Planting a SEED of Vision in Large Language Model" paper, which discusses a similar topic and aims to address comparable research questions.
To gain a better understanding and comprehensive insight into both works, I would greatly appreciate it if you could provide me with some clarification regarding the distinctions and advantages of your work in comparison to SEED