Efficient-Large-Model / VILA

VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)
Apache License 2.0
878 stars 55 forks source link

Why setting LLaMa3's padding direction to "right"? #72

Open ROIM1998 opened 3 weeks ago

ROIM1998 commented 3 weeks ago

Hi! Really appreciate your great work.

I'm a bit confused of the padding_direction being set in LLaMA3's tokenizer.json file. As said in the comments, this is used in the model's repack function. Since LLaMA3 is a autoregressive model, why did you choose to pad the embeddings and placeholder labels to the right instead of left?

Also, padding to right raises an issue where the end of the input prompt is difficult to be identified during inference. If I want to finetune the model on my own dataset, will it still work if I change the padding side from right to left? Thanks!

yaolug commented 1 week ago

Could you provide a link to the code of the padding behavior that you are asking about? Thanks.