multi class is not working

jibybabu commented 5 years ago

Hi,

I am getting the following error while trying to train a multiclass. Any help is much appreciated!

my df_train looks like the following

text id label alpha 0 text1 0 2 a 1 text2 1 2 a 2 text3 2 3 a 3 text4 3 2 a 4 text5 4 2 a

df_train.label.value_counts() 3 212925 2 71273 0 9883 1 5920 Name: label, dtype: int64

model = TransformerModel('bert', 'bert-base-cased', num_labels=4, args={'reprocess_input_data': True, 'overwrite_output_dir': True})

model.train_model(df_train) Converting to features started. 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 300001/300001 [02:41<00:00, 1855.38it/s] Selected optimization level O1: Insert automatic casts around Pytorch functions and Tensor methods.

Defaults for this optimization level are: enabled : True opt_level : O1 cast_model_type : None patch_torch_functions : True keep_batchnorm_fp32 : None master_weights : None loss_scale : dynamic Processing user overrides (additional kwargs that are not None)... After processing overrides, optimization options are: enabled : True opt_level : O1 cast_model_type : None patch_torch_functions : True keep_batchnorm_fp32 : None master_weights : None loss_scale : dynamic Epoch: 0%| | 0/1 [00:00<?, ?it/s/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:106: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype , Dtype , Dtype , long , Dtype , int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [0,0,0] Assertion t >= 0 && t < n_classes failed. /pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:106: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype , Dtype , Dtype , long , Dtype , int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [1,0,0] Assertion t >= 0 && t < n_classes failed. /pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:106: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype , Dtype , Dtype , long , Dtype , int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [2,0,0] Assertion t >= 0 && t < n_classes failed. /pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:106: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype , Dtype , Dtype , long , Dtype , int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [3,0,0] Assertion t >= 0 && t < n_classes failed. /pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:106: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype , Dtype , Dtype , long , Dtype , int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [4,0,0] Assertion t >= 0 && t < n_classes failed. /pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:106: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype , Dtype , Dtype , long , Dtype , int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [5,0,0] Assertion t >= 0 && t < n_classes failed. /pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:106: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype , Dtype , Dtype , long , Dtype , int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [6,0,0] Assertion t >= 0 && t < n_classes failed. /pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:106: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype , Dtype , Dtype , long , Dtype , int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [7,0,0] Assertion t >= 0 && t < n_classes failed. THCudaCheck FAIL file=/pytorch/aten/src/THCUNN/generic/ClassNLLCriterion.cu line=110 error=710 : device-side assert triggered Current iteration: 0%| | 0/37501 [00:00<?, ?it/s] Traceback (most recent call last): File "", line 1, in File "/home/jbabu/.local/lib/python3.7/site-packages/simpletransformers/model.py", line 142, in train_model global_step, tr_loss = self.train(train_dataset, output_dir, show_running_loss=show_running_loss) File "/home/jbabu/.local/lib/python3.7/site-packages/simpletransformers/model.py", line 367, in train outputs = model(inputs) File "/home/jbabu/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in call result = self.forward(*input, *kwargs) File "/home/jbabu/.local/lib/python3.7/site-packages/transformers/modeling_bert.py", line 913, in forward loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1)) File "/home/jbabu/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in call result = self.forward(input, kwargs) File "/home/jbabu/.local/lib/python3.7/site-packages/torch/nn/modules/loss.py", line 916, in forward ignore_index=self.ignore_index, reduction=self.reduction) File "/usr/local/lib/python3.7/dist-packages/apex/amp/wrap.py", line 28, in wrapper return orig_fn(*new_args, *kwargs) File "/home/jbabu/.local/lib/python3.7/site-packages/torch/nn/functional.py", line 2009, in cross_entropy return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction) File "/usr/local/lib/python3.7/dist-packages/apex/amp/wrap.py", line 28, in wrapper return orig_fn(new_args, **kwargs) File "/home/jbabu/.local/lib/python3.7/site-packages/torch/nn/functional.py", line 1838, in nll_loss ret = torch._C._nn.nll_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index) RuntimeError: cuda runtime error (710) : device-side assert triggered at /pytorch/aten/src/THCUNN/generic/ClassNLLCriterion.cu:110

jibybabu commented 5 years ago

FYI, the minimal example had run successfully before i tried with my dataset

ThilinaRajapakse commented 5 years ago

Is that all of your data? The labels need to be 0, ..., n where n is the number of labels.

jibybabu commented 5 years ago

sorry for not being clear! Nope! data set is like 360k rows. I just put first 4 rows for sample. Copy pasting the label value_counts again here to get a feeling about label distribution

df_train.label.value_counts() 3 212925 2 71273 0 9883 1 5920 Name: label, dtype: int64

ThilinaRajapakse commented 5 years ago

My bad, I didn't notice you'd included the value counts in the original comment.

I can't spot any obvious issues here that would cause this error. Are you using the latest version of Simple Transformers and PyTorch?

jibybabu commented 5 years ago

Here are the various package versions. Pytorch= 1.3.0 SimpleTransformers=0.4.5 Tensorflow=1.14.0

Thanks again for looking into it!

ThilinaRajapakse commented 5 years ago

Those seem fine. What version of Transformers are you using?

This error is usually caused by having more labels than num_labels and/or having a label greater than or equal to num_labels. But I can't see either of those cases in your data. Does it work if you use the AG News dataset as in the Medium article?

jibybabu commented 5 years ago

my huggingface transformers version is 2.1.1. Let me try on the AG news dataset and revert shortly

jibybabu commented 5 years ago

Same error with AG News data set as well... Pasting below

import pandas as pd

train_df = pd.read_csv('train.csv', header=None) train_df['text'] = train_df.iloc[:, 1] + " " + train_df.iloc[:, 2] train_df = train_df.drop(train_df.columns[[1, 2]], axis=1) train_df.columns = ['label', 'text'] train_df = train_df[['text', 'label']] train_df['text'] = train_df['text'].apply(lambda x: x.replace('\', ' ')) eval_df = pd.read_csv('test.csv', header=None) eval_df['text'] = eval_df.iloc[:, 1] + " " + eval_df.iloc[:, 2] eval_df = eval_df.drop(eval_df.columns[[1, 2]], axis=1) eval_df.columns = ['label', 'text'] eval_df = eval_df[['text', 'label']] eval_df['text'] = eval_df['text'].apply(lambda x: x.replace('\', ' ')) eval_df['label'] = eval_df['label'].apply(lambda x:x-1) train_df.head() text label 0 Wall St. Bears Claw Back Into the Black (Reute... 3 1 Carlyle Looks Toward Commercial Aerospace (Reu... 3 2 Oil and Economy Cloud Stocks' Outlook (Reuters... 3 3 Iraq Halts Oil Exports from Main Southern Pipe... 3 4 Oil prices soar to all-time record, posing new... 3 train_df.label.value_counts( ... ) 4 30000 3 30000 2 30000 1 30000 Name: label, dtype: int64 train_df.dtypes text object label int64 dtype: object

from simpletransformers.model import TransformerModel /home/jbabu/.local/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_qint8 = np.dtype([("qint8", np.int8, 1)]) /home/jbabu/.local/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_quint8 = np.dtype([("quint8", np.uint8, 1)]) /home/jbabu/.local/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_qint16 = np.dtype([("qint16", np.int16, 1)]) /home/jbabu/.local/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_quint16 = np.dtype([("quint16", np.uint16, 1)]) /home/jbabu/.local/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_qint32 = np.dtype([("qint32", np.int32, 1)]) /home/jbabu/.local/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. np_resource = np.dtype([("resource", np.ubyte, 1)]) model = TransformerModel('roberta', 'roberta-base', num_labels=4) model = TransformerModel('roberta', 'roberta-base', num_labels=4, args={'learning_rate':1e-5, 'num_train_epochs': 2, 'reprocess_input_data': True, 'overwrite_output_dir': True})

model.train_model(train_df) Converting to features started. 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 120000/120000 [00:15<00:00, 7962.47it/s] Selected optimization level O1: Insert automatic casts around Pytorch functions and Tensor methods.

Defaults for this optimization level are: enabled : True opt_level : O1 cast_model_type : None patch_torch_functions : True keep_batchnorm_fp32 : None master_weights : None loss_scale : dynamic Processing user overrides (additional kwargs that are not None)... After processing overrides, optimization options are: enabled : True opt_level : O1 cast_model_type : None patch_torch_functions : True keep_batchnorm_fp32 : None master_weights : None loss_scale : dynamic Epoch: 0%| | 0/2 [00:00<?, ?it/s/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:106: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype , Dtype , Dtype , long , Dtype , int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [0,0,0] Assertion t >= 0 && t < n_classes failed. /pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:106: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype , Dtype , Dtype , long , Dtype , int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [1,0,0] Assertion t >= 0 && t < n_classes failed. THCudaCheck FAIL file=/pytorch/aten/src/THCUNN/generic/ClassNLLCriterion.cu line=110 error=710 : device-side assert triggered Current iteration: 0%| | 0/15000 [00:00<?, ?it/s] Traceback (most recent call last): File "", line 1, in File "/home/jbabu/.local/lib/python3.7/site-packages/simpletransformers/model.py", line 142, in train_model global_step, tr_loss = self.train(train_dataset, output_dir, show_running_loss=show_running_loss) File "/home/jbabu/.local/lib/python3.7/site-packages/simpletransformers/model.py", line 367, in train outputs = model(inputs) File "/home/jbabu/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in call result = self.forward(*input, *kwargs) File "/home/jbabu/.local/lib/python3.7/site-packages/transformers/modeling_roberta.py", line 340, in forward loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1)) File "/home/jbabu/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in call result = self.forward(input, kwargs) File "/home/jbabu/.local/lib/python3.7/site-packages/torch/nn/modules/loss.py", line 916, in forward ignore_index=self.ignore_index, reduction=self.reduction) File "/usr/local/lib/python3.7/dist-packages/apex/amp/wrap.py", line 28, in wrapper return orig_fn(*new_args, *kwargs) File "/home/jbabu/.local/lib/python3.7/site-packages/torch/nn/functional.py", line 2009, in cross_entropy return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction) File "/usr/local/lib/python3.7/dist-packages/apex/amp/wrap.py", line 28, in wrapper return orig_fn(new_args, **kwargs) File "/home/jbabu/.local/lib/python3.7/site-packages/torch/nn/functional.py", line 1838, in nll_loss ret = torch._C._nn.nll_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index) RuntimeError: cuda runtime error (710) : device-side assert triggered at /pytorch/aten/src/THCUNN/generic/ClassNLLCriterion.cu:110

ThilinaRajapakse commented 5 years ago

train_df['label'] = train_df['label'].apply(lambda x:x-1)

I think this line was missing in the Medium article. Can you try it with it included?

The train_df value counts should have the labels 0, 1, 2, 3.

jibybabu commented 5 years ago

Hi @ThilinaRajapakse By making the labels 0,1,2,3 , the medium article is working! Also when i just passed the data frame with only 'label' and 'text', my data set worked as well! Thanks a lot for your swift responses! btw, do you have any tips/document available to how to tune the parameters efficiently? Thanks, Jiby

ThilinaRajapakse commented 5 years ago

Great to hear that you got it to work!

i just passed the data frame with only 'label' and 'text', my data set worked as well

I'll look into this. It should still work even if you have more columns.

Unfortunately, hyperparameter tuning is still largely trial and error but I can give a couple of pointers that may be useful. For Transformers, 2-4 training epochs are usually sufficient. From my experience, good learning rates are usually 1e-4 to 5e-5 range. Those are still rough estimates, but they work as a starting point. Of course, there's no guarantee that these tips will be effective in all cases (or even in most cases)!

jibybabu commented 5 years ago

Another issue came up when i tried to load the pre-trained model and predict. I was able to load the model but prediction/eval failed. Please find the below errors

model = TransformerModel('bert', 'outputs/') model.eval_model(df_test, f1=f1_multiclass, acc=accuracy_score) Traceback (most recent call last): File "", line 1, in File "/home/jbabu/.local/lib/python3.7/site-packages/simpletransformers/model.py", line 176, in eval_model self._move_model_to_device() File "/home/jbabu/.local/lib/python3.7/site-packages/simpletransformers/model.py", line 513, in _move_model_to_device self.model.to(self.device) File "/home/jbabu/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 426, in to return self._apply(convert) File "/home/jbabu/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 202, in _apply module._apply(fn) File "/home/jbabu/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 202, in _apply module._apply(fn) File "/home/jbabu/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 202, in _apply module._apply(fn) File "/home/jbabu/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 224, in _apply param_applied = fn(param) File "/home/jbabu/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 424, in convert return t.to(device, dtype if t.is_floating_point() else None, non_blocking) RuntimeError: CUDA error: device-side assert triggered

ThilinaRajapakse commented 5 years ago

Did you follow the same procedure for the evaluation dataset as for the training dataset? The labels in the evaluation dataset needs to be the same as the labels in the training dataset.

jibybabu commented 5 years ago

yup its the same! FYI, model.predict("sample text") is working fine. model.eval_model is the one which is throwing out this issue

ThilinaRajapakse commented 5 years ago

What happens if you try running eval on the training dataset itself?

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

vvssttkk commented 4 years ago

@ThilinaRajapakse hi after trained model, i want to eval model with f1_score so, when I add to eval_model(train_df, f1 = f1_score) get the error

ValueError: Target is multiclass but average='binary'. Please choose another average setting, one of [None, 'micro', 'macro', 'weighted'].

it's true

the next, added average inside f1_score like eval_model(train_df, f1=f1_score(average='micro') and get the next error:

f1_score() missing 2 required positional arguments: 'y_true' and 'y_pred'

=> how & when i should provide average='micro'?

ThilinaRajapakse commented 4 years ago

You need to wrap the f1_score function from sklearn in your own function with the correct arguments.

def f1_score_micro(y_true, y_pred):
    return f1_score(y_true, y_pred, average="micro")

ThilinaRajapakse / simpletransformers

multi class is not working #25