MaverickRen / PixelLM

PixelLM is an effective and efficient LMM for pixel-level reasoning and understanding. PixelLM is accepted by CVPR 2024.
Apache License 2.0
147 stars 4 forks source link

codebook #5

Open dbsdmlgus50 opened 5 months ago

dbsdmlgus50 commented 5 months ago

Thank you for conducting and sharing such good research!

I couldn't find anything in the current code that corresponds to codebook. Is there a code for codebook by any chance? I have additional questions about codebook.

Is codebook a pre-constructed element from the image we will learn?

Thank you.

MaverickRen commented 5 months ago

Thank you for your question. The codebook is controlled by two parameters, namely 'seg_token_num' and 'image_feature_scale_num'. The product of these two parameters determines the number of tokens to be added to the vocabulary of the LLM. This functionality is implemented around line 169 in the 'train_ds.py' file. The codes in the codebook used for segmentation are all randomly initialized.

roadcode commented 5 months ago

is the codebook used as llm input, or just used for mask decoder?

xandery-geek commented 3 months ago

@MaverickRen Are 'seg_token_num' and 'image_feature_scale_num' corresponding to the variables N and L in the paper, respectively? Thank you!