arcee-ai / DAM

30 stars 4 forks source link

Exclude Padded Tokens from Loss Computation #16

Closed shamanez closed 1 month ago

shamanez commented 1 month ago

Refactored loss functions to exclude padded tokens by using attention masks. Calculated non-padded tokens once and passed to KL divergence, entropy, and MSE loss functions. Ensured accurate loss calculations by masking logits before computation.