clorislili / ManipLLM

The official codebase for ManipLLM: Embodied Multimodal Large Language Model for Object-Centric Robotic Manipulation(cvpr 2024)
33 stars 2 forks source link

About MLM in the paper #2

Open OedoSoldier opened 1 month ago

OedoSoldier commented 1 month ago

Hi, I've read your paper and have some questions about the MLM you mentioned: "... we mask out the value of coordinate or direction vectors in the input text prompt and promote the model to infill the missing characters."

As far as I know, LLaMA is a decoder-only transformer-based LLM (aka next token prediction). This differs from encoder-only LLMs like BERT, which can apply MLM; it can only access the information before the to-predict token. How did you apply MLM on LLaMA? Did you modify the casual mask?