Remove initial -100 label from CLM targets

foundation-model-stack / fms-fsdp

🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash attention v2.

https://pytorch.org/docs/stable/fsdp.html

Apache License 2.0

114 stars 18 forks source link

Remove initial -100 label from CLM targets #86

Closed daviswer closed 1 month ago

daviswer commented 1 month ago

Current behavior does not align with torchtitan and the negative index has started to cause issues. Sets default prefix length to 0 instead of 1, masking no tokens instead of the first.