LLM360 / k2-train

Apache License 2.0
29 stars 4 forks source link

Document boundaries/blocked attention? #1

Open jwkirchenbauer opened 1 month ago

jwkirchenbauer commented 1 month ago

A relatively simple question, that I couldn't quite clarify by looking through the tech report...

During your pretraining (report section 3.1) or instruction tuning phases (report section 3.2), any time samples are "packed together" does your pipeline allow attention masks to cross document boundaries?

hunterhector commented 1 month ago

Right, we didn't add mask there. The hope is to let the model to figure out how to use <eos> or <|endofchat|> to "refresh" the context. @tanyuqian can correct me if I am wrong.

It is a bit unconventional to do it this way with instruction tuning, but packing save us a lot of time. Intuitively, it kinda simulated the scenario where the user ends a prior conversation and starts a new one