Closed LisaBanana closed 4 years ago
The warning should be safe to ignore as discussed here.
The results you are getting is likely because the training dataset is far too small for the model to learn anything useful. However, the model is "training" (so to speak) as the predicted values are changing.
Hi,
Thanks for your answer, I wasn't very worried about the warning messages as we discussed previously, I thought maybe they could bring an insight for my problem as I couldn't understand why there where popping here. But, anyway, not a big deal.
I had the same result with a bigger dataset previously (+/-7000 lines of +/-200 words) that's why I tried with a smaller one to check if the issue was not from my code.
I have an even bigger dataset that could maybe give a result. I'll try it and tell you if there's any changes. Thanks again for your quick answer.
Kind regards,
Lisa
Le lun. 6 janv. 2020 à 19:08, Thilina Rajapakse notifications@github.com a écrit :
The warning should be safe to ignore as discussed here https://discuss.pytorch.org/t/got-warning-couldnt-retrieve-source-code-for-container/7689/12 .
The results you are getting is likely because the training dataset is far too small for the model to learn anything useful. However, the model is "training" (so to speak) as the predicted values are changing.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ThilinaRajapakse/simpletransformers/issues/126?email_source=notifications&email_token=ALP3CMGVXUGO62NXU3W4VADQ4NXTDA5CNFSM4KDFIWKKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIGIJ7A#issuecomment-571245820, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALP3CMH5IR5RG2COLB2WAM3Q4NXTDANCNFSM4KDFIWKA .
Hi I get the same issue with camembert
too. With m-bert it works... (not as good as I want but not 0.5 accuracy :-) )
I have played a bit with the source code and it appears that at the level of the line logits = self.classifier(sequence_output)
in forward
function, the values are already very low/negatives... and finish to be classified as 0.
Any idea @ThilinaRajapakse of what we can do to help?
Did you saw Camembert returning something else than 0
?
I imagine than even with English data, you can do a rapid check that we can try to reproduce on our machines.
I'm running this now. I'll get back to you guys when I have something.
In case you are looking for French data, check this repo
https://github.com/getalp/Flaubert
Like CamemBERT, Flaubert is BERT for French which comes with Flue a kind of Glue... in French !
I tested it on a small English dataset and it seems to work. It's definitely returning more than just 0's. I did notice that it would give all 0's or all 1's at the beginning before eventually giving more reasonable outputs. I am not sure whether this is due to the model or because I was using English data.
You can see the results here.
🤔 strange I let it run for 10 epochs, still having 100% zeros. I will redo for 100 but I definitely think there is something strange as I use CamemBERT a lot on other datasets and saw it works like any other Bert based model. Will keep you informed.
After 100 epochs... still 100% 1
!
Features loaded from cache at ./output/cache_simple_transformer/cached_dev_camembert_128_2_256
{'mcc': 0.0, 'tp': 128, 'tn': 0, 'fp': 128, 'fn': 0, 'acc': 0.5, 'eval_loss': 0.6931496541947126}
My code:
import random
from simpletransformers.classification import ClassificationModel
import pandas as pd
import sklearn
def load(path: str):
result = list()
with open(path) as f:
for line in f.readlines():
s1, s2, label = line.split("\t")
result.append((s1, s2, int(float(label))))
random.shuffle(result)
return pd.DataFrame(result, columns=['text_a', 'text_b', 'labels'])
train_df = load("*****.tsv")
eval_df = load("*****.tsv")
train_args = {
'reprocess_input_data': False, # True
'overwrite_output_dir': True,
'num_train_epochs': 50,
'fp16': False,
'silent': True,
'evaluate_during_training': True,
'evaluate_during_training_steps': 0,
'output_dir': "./output/simple_transformer",
'cache_dir': './output/cache_simple_transformer/',
# 'do_lower_case': True,
'use_multiprocessing': False,
}
model = ClassificationModel('camembert', 'camembert-base', use_cuda=True, args=train_args)
# model = ClassificationModel('bert', 'bert-base-multilingual-cased', use_cuda=True, args=train_args)
# model = ClassificationModel('distilbert', 'distilbert-base-multilingual-cased', use_cuda=True, args=train_args)
# Train the model
model.train_model(train_df, eval_df=eval_df, show_running_loss=False, acc=sklearn.metrics.accuracy_score)
# Evaluate the model
scores, model_outputs, wrong_predictions = model.eval_model(eval_df, acc=sklearn.metrics.accuracy_score, verbose=True)
Another interesting thing, when I uncomment 'do_lower_case': True,
first epoch is 100% 0
, and at epoch 2 and after, I get 100% 1
.
Again when I try mBERT instead of Camembert... it learns at the first epoch and continues after. Same for distilled mBERT (with slightly lower results than mBERT). The dataset is perfectly balanced and few thousands large.
I still have no idea where the issue is. do you see an issue with the code above?
Edit: Other interesting thing, the loss is very stable... like it doesn't learn anything at all.
strange I let it run for 10 epochs, still having 100% zeros. I will redo for 100 but I definitely think there is something strange as I use CamemBERT a lot on other datasets and saw it works like any other Bert based model. Will keep you informed.
Do you mean that CamemBERT (Simple Transformers implementation) works when used with other datasets?
I can't spot any issues in your code either. This is puzzling indeed!
I use it in another lib (Flair mainly) and I do know it works well, much better than mBERT on French for instance.
That's why I try to understand what is happening here because clearly it should not behave that way.
How may I help debug the thing?
Is there somewhere inside the lib I can check the behaviour?
Clearly logit
variable is too late.
May be it s something related to the tokenization done by Camembert, or some very low learning rate. I have no Idea.
I checked the forward
input_ids
too, I can see the first token and the last token of each example of each batch are always the same which is expected, and other tokens in between are some numbers, so everything looks ok for me.
The attention_mask
seems to be always 1
, I don't get why it s like there is no padding, but it s the same with mBERT so ok for me.
token_type_ids=None, position_ids=None, head_mask=None,
are all undefined.
I am using transformers 2.3.0
Are those observations ok for you?
Thank you for the detailed information!
The CamemBERT model was a community addition but the implementation looked fine to me. I think the issue may have been caused by the model subclassing the RoBERTa model from the Hugging Face library directly rather than the Simple Transformers implementation. If so, the fix I pushed just now should clear it up. Can you run it and let me know?
I am no longer seeing the weird all 0's to all 1's behaviour at the beginning after making this change.
It s running now. I can already tell you that it starts to learn something.
Converting to features started. Cache is not used.
Features loaded from cache at ./output/cache_simple_transformer/cached_dev_camembert_128_2_256
{'mcc': 0.06362847629757777, 'tp': 6, 'tn': 125, 'fp': 3, 'fn': 122, 'acc': 0.51171875, 'eval_loss': 0.689890056848526}
Features loaded from cache at ./output/cache_simple_transformer/cached_dev_camembert_128_2_256
{'mcc': 0.11642436803197997, 'tp': 11, 'tn': 124, 'fp': 4, 'fn': 117, 'acc': 0.52734375, 'eval_loss': 0.688759284093976}
Features loaded from cache at ./output/cache_simple_transformer/cached_dev_camembert_128_2_256
{'mcc': 0.21352448376514868, 'tp': 125, 'tn': 18, 'fp': 110, 'fn': 3, 'acc': 0.55859375, 'eval_loss': 0.7145331678912044}
Features loaded from cache at ./output/cache_simple_transformer/cached_dev_camembert_128_2_256
{'mcc': 0.3923530128589653, 'tp': 83, 'tn': 95, 'fp': 33, 'fn': 45, 'acc': 0.6953125, 'eval_loss': 0.6186056612059474}
Features loaded from cache at ./output/cache_simple_transformer/cached_dev_camembert_128_2_256
{'mcc': 0.36007896737396095, 'tp': 91, 'tn': 83, 'fp': 45, 'fn': 37, 'acc': 0.6796875, 'eval_loss': 0.6091095320880413}
Features loaded from cache at ./output/cache_simple_transformer/cached_dev_camembert_128_2_256
{'mcc': 0.38298414139387743, 'tp': 75, 'tn': 101, 'fp': 27, 'fn': 53, 'acc': 0.6875, 'eval_loss': 0.6124491011723876}
The scores are similar to mBERT for now. So I would say you fixed the bug!!!!
Thanks a lot.
I also tested with xlmroberta and got the 100% 0 too for a few epochs.
model = ClassificationModel('xlmroberta', 'xlm-roberta-base', use_cuda=True, args=train_args) # xlm-roberta-large
Right now learning with Camembert so didn't try to clean code myself but may be it is the same story.
I ran it with xlmroberta but I didn't get the weird behaviour. Also, the CamemBERT fix was to make it mirror the XLMRoBERTa implementation. Maybe deleting the cache directory and rerunning it might help.
FYI regarding XLMRoBERTa
, I highly decreased the learning rate... and it started to learn something. Results are quite low but not 100% 0 or 1, so probably not a bug but the multi-languages support cost.
I am quite surprised as Adam is supposed to adjust the LR
per parameter...
Anyway, I think you can close this issue as the bug is fixed :-)
Ok, I'll close this then. Let me know if something else comes up.
Describe the bug I train a Camembert + Classification model with the folowing codes line : my training datas are a dataset with 10 lines (it's a very small test before a bigger experiment) with 5 lines labeled 0 and 5 lines labeled 1. each lines is between 90 to 500 words long.
Create a TransformerModel
(train args used are the default ones)
after training is complete, I have a few warnings on my console like :
C:\Users\bezl\Envs\torch\lib\site-packages\torch\serialization.py:292: UserWarning: Couldn't retrieve source code for container of type CamembertForSequenceClassification. It won't be checked for correctness upon loading. "type " + obj.name + ". It won't be checked " (I don't really understand what it implies)
Than it starts to convert features for the prediction phase (test_data made from 8 lines of my datas, unlabeld but 4 are extracted from 1 labeled lines and 4 from 0 labeled lines, easyer for me to check how the model handles the classification this way)
Expected behavior
Predictions are supposed to be 0 or 1 but I only have 0.
anyway, all help or suggestion are welcome, I can copy/paste all my code if required. Thanks :)