Closed shamanez closed 1 month ago
Refactored loss functions to exclude padded tokens by using attention masks. Calculated non-padded tokens once and passed to KL divergence, entropy, and MSE loss functions. Ensured accurate loss calculations by masking logits before computation.
Refactored loss functions to exclude padded tokens by using attention masks. Calculated non-padded tokens once and passed to KL divergence, entropy, and MSE loss functions. Ensured accurate loss calculations by masking logits before computation.