Closed avi-jain closed 4 years ago
This is often caused by not setting the num_labels
parameter to the same value that was used when training the model. Make sure that you use the same value when loading a saved model.
Additional context - I'm using eval in the same context (jupyter notebook) in which the model was trained. Not loading it again. Although, by disabling GPUs and loading the model, eval works
Can you show me the line you are executing to perform the evaluation? Also, the terminal where you launch the jupyter notebook might show some additional info about the error.
I checked the terminal output. Nothing except for a couple of warnings (WARNING | WARNING: attempted to send message from fork). I'll save all my transient variables, load the model after enabling gpus and try the eval again.
The line is the same as your example
result, model_outputs, wrong_predictions = model.eval_model(df_bert_dev, verbose=True)
Attaching full stack-trace
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-26-8eb2d33eb606> in <module>()
----> 1 predictions, raw_outputs = model.predict(["Sentence test"])
~/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/simpletransformers/classification/classification_model.py in predict(self, to_predict, multi_label)
563 for batch in tqdm(eval_dataloader, disable=args['silent']):
564 model.eval()
--> 565 batch = tuple(t.to(device) for t in batch)
566
567 with torch.no_grad():
~/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/simpletransformers/classification/classification_model.py in <genexpr>(.0)
563 for batch in tqdm(eval_dataloader, disable=args['silent']):
564 model.eval()
--> 565 batch = tuple(t.to(device) for t in batch)
566
567 with torch.no_grad():
RuntimeError: CUDA error: device-side assert triggered
Check how much GPU memory is in use and try lowering the eval_batch_size
in case the evaluation is exceeding the available GPU memory. Rerunning cells inside Jupyter tends to cause memory issues, as it doesn't release memory properly.
Also, it might be better to get the labels from the entire df rather than just the train df on the off chance that the dev df contains labels that are not in train.
num_labels=len(df_bert['labels'].unique())
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
@ThilinaRajapakse Hi, Just letting you know that I faced the same error 'RuntimeError: CUDA error: device-side assert triggered' today with the latest version(0.22.0). After trying a lot of things to overcome it, I tried a lower version of simpletransformers(0.20.0) in the end and it worked. I was trying this example: https://towardsdatascience.com/simple-transformers-introducing-the-easiest-bert-roberta-xlnet-and-xlm-library-58bf8c59b2a3
Thanks a lot for the great wrapper :)
Do you mind running with use_cuda=False
and showing me the error?
@ThilinaRajapakse Hi, Just letting you know that I faced the same error 'RuntimeError: CUDA error: device-side assert triggered' today with the latest version(0.22.0). After trying a lot of things to overcome it, I tried a lower version of simpletransformers(0.20.0) in the end and it worked. I was trying this example: https://towardsdatascience.com/simple-transformers-introducing-the-easiest-bert-roberta-xlnet-and-xlm-library-58bf8c59b2a3
Thanks a lot for the great wrapper :)
I had the same issue. Switching to 0.20.0 version helped me, thanks!
I can't replicate the issue guys. Is it only happening with the code from the Medium article?
I can't replicate the issue guys. Is it only happening with the code from the Medium article?
I've used my own dataset (multi-class classification). After downgrading on 0.20.0 (and then to 0.21.4) it works fine on GPU. Sometimes I have to restart the kernel but still it's OK. But now I have another issue simialr to #267 . During training and validating the loss score is OK, but when I try to predict, for different texts I got almost same probabilities. I've balanced the dataset, tried different hyperparameters, etc. But nothing helps.
@ThilinaRajapakse I followed the steps on medium article, though dataset was different. Will try to reproduce it again.
@DmLitov4 I was also running into same issues of same probabilities while predicting. I was iterating over the test dataset, and the probability weights and outputs for each of the data row was same. But then I passed the test set as whole to model.predict and then got different output values. Maybe you are also doing the same. @ThilinaRajapakse can maybe comment on this.
Thanks.
The issue with similar predictions was a bug in some of the older versions. I'm guessing the versions you guys downgraded to was afflicted with it as well.
The issue with similar predictions was a bug in some of the older versions. I'm guessing the versions you guys downgraded to was afflicted with it as well.
Yes, I've heard about this issue. But I've used 0.21.4 version (without that problem) and it seems like the problem was hiding behind hyperparameters of my model. I've changed it a lot, retrained my model and now it words fine with both XLNet and Bert. Thank you so much your hard work, it helps us a lot.
Just to clarify, some of you are still running into issues with the latest versions? :thinking:
You are welcome!
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Guys use CPU (it means use_cuda = False) if u faces this situation with GPU. If problem in data u can find better exeption output, and solve problem
Describe the bug I trained a distilbert model for Classification. Now when I try to use the model to eval or predict I get a
RuntimeError: CUDA error: device-side assert triggered
To Reproduce Steps to reproduce the behavior: Train a model using the paramsClassificationModel('distilbert', 'distilbert-base-uncased', num_labels=len(df_bert_train['labels'].unique()), args={'reprocess_input_data': True, 'overwrite_output_dir': True, 'max_seq_length': 64, 'train_batch_size':16, 'fp16':False, 'num_train_epochs': 10})
Desktop (please complete the following information):