OpenMOSS / AnyGPT

Code for "AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling"
629 stars 43 forks source link

About input formats for training and inference #25

Open wen020 opened 1 month ago

wen020 commented 1 month ago

Anygpt is trained only with the Next Token Prediction task. Take text to image as an example,Is the training input speech tokens text tokens image tokens music tokens? I want to know the input formats for training and inference. training input :\<sos> speech tokens \<eos> text tokens \<soi> image tokens \<eoi> \<som> music tokens, training label :speech tokens \<eos> text tokens \<soi> image tokens \<eoi> \<som> music tokens \<eom>. Is my understanding correct about training input and label?