Closed romran closed 5 years ago
I believe this means some of the examples in your dataset are shorter than the largest CNN kernel size. Thus, you either need to pad your examples so they're at least as long as the largest kernel, or reduce the size of your CNN kernels.
Many thanks for your reply, could you eleborate more on answer? Where I can find and compare size of CNN kernel vs example ?
Close issue by accident :/ still looking for solution
This gist might make things clearer: https://gist.github.com/bentrevett/adb968471b3fe1ef82eae18cf13772ab
We have a conv layer that has a kernel height of 5 and a width of embedding_dim
which we have set to 128.
The height of this filter is the size of the n-grams we are looking at in our input sentence, i.e. a height of 2 means we are looking at bi-grams, height of 3 means tri-grams, 4 for 4-grams, etc.
We are only using a single convolutional layer that has a height of 5, therefore looks for 5-grams. The tensor a
is a batch of sentences of length 5. We can process this fine with our convolutional layer as the 5-gram filters perfectly fit the length of our sentences.
Tensor b
is a batch of sentences of length 3. We cannot use our convolutional layer over this as it is looking for 5-grams (height of 5), whereas our sentences only have 3 tokens (height of 3). This is what causes the error, our sentences are shorter than the height of the kernel.
The solution is to reduce the height of the filter to 3, or to do some processing to pad all sentences to ensure they are at least the length of the largest filter height.
Thanks for the explanation, solved error by adding Padding parameter into:
self.convs = nn.ModuleList([nn.Conv2d(in_channels=1, out_channels=n_filters, padding=(fs, 0), kernel_size=(fs,embedding_dim)) for fs in filter_sizes])
Problem that my dataset is unbalanced, I mean 95% positive versus 5% negative reviews and model overfits, below example from trainning with dropout 0.75:
May you give any advice on CNN training for text classification with unbalanced dataset?
The only methods I'm aware of are:
Upsampling the negative examples. This is where we just add copies of the negative reviews until there's the same amount of positive reviews.
Using the pos_weight
argument of BCEWithLogitsLoss
. Using the docs, you have 0.95 positive reviews and 0.05 negative reviews, therefore if you use a pos_weight
of 0.05/0.95 (= 0.0526ish) it will reduce the amount of weight (the amount of loss) given to the positive examples.
Many thanks for insights]
Hello, Very nice tutorial, many thanks for sharing. I succesfully trained RNN with my custom dataset, but getting
RuntimeError: cuDNN error: CUDNN_STATUS_BAD_PARAM
when trying to train CNN from "4 - Convolutional Sentiment Analysis". May someone know where could be a problem?