OpenMOSS / AnyGPT

Code for "AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling"
748 stars 59 forks source link

Loss Masking #20

Closed gmltmd789 closed 4 months ago

gmltmd789 commented 5 months ago

Thank you for providing the model code and checkpoints.

I'm planning to fine-tune the base model you provided for a downstream task. From what I've seen in the code you shared, it doesn't seem like there is separate loss masking (an action where the prompt doesn't calculate loss and only the target token calculates loss and passes the gradient).

I'm curious if you actually didn't use loss masking for all tokens when conducting instruct tuning (while building the -chat model).

JunZhan2000 commented 4 months ago

Hi, it's great to hear that you're interested in using our model.

Regarding your question, actually, during training, we didn't calculate loss based on the prompt; instead, we only calculated loss based on the responses. However, I don't think this detail is particularly important. As far as I know, some models also calculate loss based on the entire sequence.

gmltmd789 commented 4 months ago

Thank you for your response!