Closed akhila-s-rao closed 2 days ago
I dug a bit deeper and think I have identified a bug. I found this in the DAE model implementation but it likely exists in the other approaches as well.
The reconstruction_head uses an MLP from commons/ which does not have any output layer. The last layer is linear and this is used for all features.. continuous and categorical. However the categorical features need a #classes sized output layer for each of those categorical features. So when the predictions are made passing X through the encoder and then the reconstruction_head, the categorical features are basically treated like continuous features. However the loss function used is different for each. So when the predictions from the linear output layer are passed to the CrossEntropyLoss function it messes up and gives negative vales as the loss.
I can submit a pull request to fix this bug if you like. Assuming I am not completely wrong here of course hehe.
Hi, @akhila-s-rao
First of all, I apologize for the delayed response.
Thank you for your thorough investigation and for bringing this issue to my attention. You've correctly identified a significant bug in my implementation. It seems the current implementation treats categorical features like continuous ones, which leads to the negative loss values you've observed.
I greatly appreciate your offer to submit a pull request to fix this bug. Your contribution would be invaluable in improving the project.
Again, thank you for your support and kind words about the project!
Best regards.
So, I have currently only fixed it enough for my scenario wherein I use binary cross entropy for my 2 class categorical features. I shall extend it to a multi class categorical cross entropy scenario and send a pull request soon. Thanks for your response !
Hi @akhila-s-rao I just wanted to follow up and check if you’re still planning to work on the pull request. If you’re too busy, I can go ahead and complete it. Please let me know!
Hi, I currently have implemented it only for a binary classification scenario for categorical features during reconstruction (since my categorical features are only 2 class). I am yet to implement it for a multi class scenario, so please go ahead and implement it. I have been very busy.
Kudos again for maintaining a useful repository !
From: Minwook Kim @.> Sent: Tuesday, November 26, 2024 9:43 AM To: Alcoholrithm/TabularS3L @.> Cc: Akhila Rao @.>; Mention @.> Subject: Re: [Alcoholrithm/TabularS3L] Train loss is negative (Issue #17)
Hi @akhila-s-raohttps://github.com/akhila-s-rao I just wanted to follow up and check if you’re still planning to work on the pull request. If you’re too busy, I can go ahead and complete it. Please let me know!
— Reply to this email directly, view it on GitHubhttps://github.com/Alcoholrithm/TabularS3L/issues/17#issuecomment-2500007347, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AHWFCIFVAJGJNTTVI6KOJM32CQYB3AVCNFSM6AAAAABN5GLV5OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKMBQGAYDOMZUG4. You are receiving this because you were mentioned.Message ID: @.***>
Hi,
When doing first phase training over DAE and VIME (using unlabeled data), I got negative CrossEntropyLoss for the categorical features which resulted in a negative training and validation loss. This lead me to print the predictions for the categorical features and it was a Tensor of size [batch_size, num_categoricals] However this isn't the expected input to CrossEntropyLoss from PyTorch which expects logits in a vector of size num_classes. I don't quite understand in your implementation, how the reconstruction of categorical features work ? Am I missing something here ? As I understand the train_loss should not be negative (which I see for DAE and VIME)
This is what printing the predictions over categorical features looks like (functional/dae.py) cat feature preds: tensor([[ 1.5704], [-0.6245], [ 0.4721], [-0.1746], [ 1.0408], [ 2.5116], [ 1.1048], [-1.1651], [ 4.2188], [ 0.7524], [-0.1088],
Help is appreciated ! Thanks !! :)
P.S. Super useful project !