Closed celine-setyawan closed 3 years ago
Hi Chrysant Celine Setyawan, Sorry for the late reply,
But when I want to dig more into the code, I can't find it. I can just find
the CrossEntropyLoss() in:
- https://github.com/indobenchmark/indonlu/blob/master/modules/multi_label_classification.py
- https://github.com/indobenchmark/indonlu/blob/master/modules/word_classification.py
none are for BertForSequenceClassification.
BertForSequenceClassification is a model class defined from transformers package, you can check the package requirement on the requirements.txt file. You can check the source code for BertForSequenceClassification in https://huggingface.co/transformers/_modules/transformers/models/bert/modeling_bert.html#BertForSequenceClassification https://huggingface.co/transformers/_modules/transformers/models/bert/modeling_bert.html#BertForSequenceClassificationor from their github page on https://github.com/huggingface/transformers.
But, the SmSA fine-tuning examples
https://github.com/indobenchmark/indonlu/blob/master/examples/finetune_smsa.ipynb doesn't show anything about the ground truth being hot-encoded, they are being label-encoded instead. I also tried to print out the list_hyp and list_label, in case they are being one-hot encoded somewhere outside the code that I can see, but the outputs are just how the way they are (mapping from LABEL2INDEX)
For cross entropy loss, we do not need to perform one-hot encoding by ourself as it requires the input logits to be a FloatTensor with size of (N,C) and the label to be a LongTensor with size of (N). In this case you just need to pass the index of the label instead of the one-hot encoded representation of the label (you can check the PyTorch documentation for CrossEntropyLoss here https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html).
Meanwhile I suppose the SmSA label doesn't have an order or rank, right?
I suppose what you mean is ordinal. Yes, it is not ordinal, SmSA label is a nominal data which is used for classification of N distinct unordered classes
Btw, if you work on a custom loss function to replace Cross Entropy, you can also check this paper https://arxiv.org/pdf/2101.03841.pdf. I hope all the answers are clear and I hope the best for your thesis. Thank you!
-- Best Regards,
Samuel Cahyawijaya
Phone : +6281320773270 Email : @.***
On Thu, May 6, 2021 at 11:36 PM Chrysant Celine Setyawan < @.***> wrote:
Hi, IndoNLU team,
Thanks for your amazing work! I'm currently working on my bachelor thesis with this IndoBERT for SequenceClassification Task. If I want to change my loss function for fine tuning, where or how can I do it?
From your tutorials here https://indobenchmark.github.io/tutorials/pytorch/deep%20learning/nlp/2020/10/18/basic-pytorch-en.html#training-phase, I found out that you use CrossEntropy as the loss function for multiclass classification task (sentiment analysis in that case).
But when I want to dig more into the code, I can't find it. I can just find the CrossEntropyLoss() in:
- https://github.com/indobenchmark/indonlu/blob/master/modules/multi_label_classification.py
https://github.com/indobenchmark/indonlu/blob/master/modules/word_classification.py
none are for BertForSequenceClassification.
The tutorials also mentioned :
"Cross entropy loss is calculated by comparing how well the probability distribution output by Softmax matches the one-hot-encoded ground truth label of the data."
But, the SmSA fine-tuning examples https://github.com/indobenchmark/indonlu/blob/master/examples/finetune_smsa.ipynb doesn't show anything about the ground truth being hot-encoded, they are being label-encoded instead. I also tried to print out the list_hyp and list_label, in case they are being one-hot encoded somewhere outside the code that I can see, but the outputs are just how the way they are (mapping from LABEL2INDEX). Meanwhile I suppose the SmSA label doesn't have an order or rank, right? So is my thesis task.
Thank you in advance! Regards, Celine.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/indobenchmark/indonlu/issues/24, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAVSC2Q3CK6RZLQPMMY4B2LTMKZQHANCNFSM44HJOEGA .
Hi, Kak Samuel, It's fine. It's fast enough for me.
Sorry, what I meant before was 'multi class classification' not 'BertForSequenceClassification'.
Ah I see, it's clear now . Thank you so much for the answers and pointers! Ah yes right, ordinal.
Thank you for your time Kak!
Hi, IndoNLU team,
Thanks for your amazing work! I'm currently working on my bachelor thesis with this IndoBERT for SequenceClassification Task. If I want to change my loss function for fine tuning, where or how can I do it?
From your tutorials here, I found out that you use CrossEntropy as the loss function for multiclass classification task (sentiment analysis in that case).
But when I want to dig more into the code, I can't find it. I can just find the
CrossEntropyLoss()
in:none are for multi class classification.
The tutorials also mentioned :
But, the SmSA fine-tuning examples doesn't show anything about the ground truth being hot-encoded, they are being label-encoded instead. I also tried to print out the
list_hyp
andlist_label
, in case they are being one-hot encoded somewhere outside the code that I can see, but the outputs are just how the way they are (mapping fromLABEL2INDEX
). Meanwhile I suppose the SmSA label doesn't have an order or rank, right? So is my thesis task.Thank you in advance! Regards, Celine.