Removed the check and update for labels with zero data, as this was causing issues during evaluation
Resolved an issue where the confusion matrix couldn't be calculated when testing on a single class with an F1 score of 1, as it expected the original number of training classes (3)
Updated the attention mask creation to dynamically use the actual pad_idx value instead of assuming it to be 0
Pushing for 3 updates: