bentrevett / pytorch-sentiment-analysis

Tutorials on getting started with PyTorch and TorchText for sentiment analysis.
MIT License
4.37k stars 1.17k forks source link

RuntimeError: cuDNN error: CUDNN_STATUS_BAD_PARAM in "4 - Convolutional Sentiment Analysis" #21

Closed romran closed 5 years ago

romran commented 5 years ago

Hello, Very nice tutorial, many thanks for sharing. I succesfully trained RNN with my custom dataset, but getting RuntimeError: cuDNN error: CUDNN_STATUS_BAD_PARAM when trying to train CNN from "4 - Convolutional Sentiment Analysis". May someone know where could be a problem?

bentrevett commented 5 years ago

I believe this means some of the examples in your dataset are shorter than the largest CNN kernel size. Thus, you either need to pad your examples so they're at least as long as the largest kernel, or reduce the size of your CNN kernels.

romran commented 5 years ago

Many thanks for your reply, could you eleborate more on answer? Where I can find and compare size of CNN kernel vs example ?

romran commented 5 years ago

Close issue by accident :/ still looking for solution

bentrevett commented 5 years ago

This gist might make things clearer: https://gist.github.com/bentrevett/adb968471b3fe1ef82eae18cf13772ab

We have a conv layer that has a kernel height of 5 and a width of embedding_dim which we have set to 128.

The height of this filter is the size of the n-grams we are looking at in our input sentence, i.e. a height of 2 means we are looking at bi-grams, height of 3 means tri-grams, 4 for 4-grams, etc.

We are only using a single convolutional layer that has a height of 5, therefore looks for 5-grams. The tensor a is a batch of sentences of length 5. We can process this fine with our convolutional layer as the 5-gram filters perfectly fit the length of our sentences.

Tensor b is a batch of sentences of length 3. We cannot use our convolutional layer over this as it is looking for 5-grams (height of 5), whereas our sentences only have 3 tokens (height of 3). This is what causes the error, our sentences are shorter than the height of the kernel.

The solution is to reduce the height of the filter to 3, or to do some processing to pad all sentences to ensure they are at least the length of the largest filter height.

romran commented 5 years ago

Thanks for the explanation, solved error by adding Padding parameter into: self.convs = nn.ModuleList([nn.Conv2d(in_channels=1, out_channels=n_filters, padding=(fs, 0), kernel_size=(fs,embedding_dim)) for fs in filter_sizes]) Problem that my dataset is unbalanced, I mean 95% positive versus 5% negative reviews and model overfits, below example from trainning with dropout 0.75: train May you give any advice on CNN training for text classification with unbalanced dataset?

bentrevett commented 5 years ago

The only methods I'm aware of are:

  1. Upsampling the negative examples. This is where we just add copies of the negative reviews until there's the same amount of positive reviews.

  2. Using the pos_weight argument of BCEWithLogitsLoss. Using the docs, you have 0.95 positive reviews and 0.05 negative reviews, therefore if you use a pos_weight of 0.05/0.95 (= 0.0526ish) it will reduce the amount of weight (the amount of loss) given to the positive examples.

romran commented 5 years ago

Many thanks for insights]