Holmeyoung / crnn-pytorch

Pytorch implementation of CRNN (CNN + RNN + CTCLoss) for all language OCR.
MIT License
378 stars 105 forks source link

input to train.py #17

Closed SreenijaK closed 5 years ago

SreenijaK commented 5 years ago

I want to train with my own dataset. First I used the create_dataset.py and it created two files- data.mdb and lock.mdb. Now i gave the same path of data.mdb to train and validate. If this is not right How do i split to train and validate.

And when i run train.py, I get the following error

File "train.py", line 60, in train_dataset = dataset.lmdbDataset(root=opt.trainRoot) File "/home/ramu_yarru/ctpn/crnn.pytorch/dataset.py", line 25, in init meminit=False) lmdb.Error: /home/ramu_yarru/ctpn/crnn_train/data/train/data.mdb: Not a directory

What exactly is the input to train.py ?

niddal-imam commented 5 years ago

You need to create two files train.txt and val.txt and then run create_dataset.py for both files. This will create train with two data.mdb and val with two data.mdb.

SreenijaK commented 5 years ago

Thanks so my input to the train file will the dataset folder ? python train.py --trainroot path/to/train/dataset --valroot path/to/val/dataset.

If we are giveing the folder as input where do we use the .mdb files?

niddal-imam commented 5 years ago

Yes, trainroot is the path to train folder which contains two data.mdb files, and valroot is the path to val folder. train.py will read the data.mdb files.

SreenijaK commented 5 years ago

i've created two folders train and validate. when i run i get this error File "train.py", line 178, in cost = trainBatch(crnn, criterion, optimizer) File "train.py", line 162, in trainBatch cost = criterion(preds, text, preds_size, length) / batch_size File "/home/ramu_yarru/.local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(*input, **kwargs) File "/home/ramu_yarru/.local/lib/python2.7/site-packages/torch/nn/modules/loss.py", line 1332, in forward self.zero_infinity) File "/home/ramu_yarru/.local/lib/python2.7/site-packages/torch/nn/functional.py", line 1813, in ctc_loss zero_infinity) RuntimeError: Tensor for argument #2 'targets' is on CPU, but expected it to be on GPU (while checking arguments for ctc_loss_gpu)

SreenijaK commented 5 years ago

from https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix/issues/397#issuecomment-427750296 link i think error is because of line preds_size = Variable(torch.IntTensor([preds.size(0)] batch_size)). We might have to use torch.cuda module but torch.cuda doesnt have intTensor. Do you know how can i fix the same? I've converted preds_size = Variable(torch.IntTensor([preds.size(0)] batch_size)).cuda(), it still gives the same error

Holmeyoung commented 5 years ago

Hi, can you try to run the code in python3.6

SreenijaK commented 5 years ago

i'm using 3.5.3, should i upgrade to 3.6?

Holmeyoung commented 5 years ago

Hi, you are using 3.5, but why your pytorch in installed in 2.7

File "/home/ramu_yarru/.local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 493, in call
Holmeyoung commented 5 years ago

Are you using anaconda? If so, you should use pip install but not sudo pip install. Maybe it’s the issue.

SreenijaK commented 5 years ago

when i run with python3 i get the same error: Traceback (most recent call last): File "train.py", line 179, in cost = trainBatch(crnn, criterion, optimizer) File "train.py", line 163, in trainBatch cost = criterion(preds, text, preds_size, length) / batch_size File "/home/ramu_yarru/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(*input, **kwargs) File "/home/ramu_yarru/.local/lib/python3.5/site-packages/torch/nn/modules/loss.py", line 1332, in forward self.zero_infinity) File "/home/ramu_yarru/.local/lib/python3.5/site-packages/torch/nn/functional.py", line 1813, in ctc_loss zero_infinity) RuntimeError: Tensor for argument #2 'targets' is on CPU, but expected it to be on GPU (while checking arguments for ctc_loss_gpu)

SreenijaK commented 5 years ago

but when i make cuda false the code runs but is too slow.

And also when i change the height to 64 and width to 900 it throws an error, what do i do to change my height to 64? And If i want to train with more than 26 length, what changes should i make

Holmeyoung commented 5 years ago

Hi, can you try to replace the int with long as said in https://discuss.pytorch.org/t/ctc-loss-function-not-working-with-cuda-when-using-torch-int32/30152

Holmeyoung commented 5 years ago

When you change the height to 64 and width to 900, what's the error output. And if you want to change length 26, you need to change the net structure in crnn.py. The T length in lstm.

SreenijaK commented 5 years ago

hi I changed to torch.cuda.LongTensor in all the places and the code seems to work fine. thank you so much.

SreenijaK commented 5 years ago

When i change the height to 64 and width to 900 i get this error: AssertionError: the height of conv must be 1

Holmeyoung commented 5 years ago

OK, it's the conv in crnn.py

    def forward(self, input):
        # conv features
        conv = self.cnn(input)
        b, c, h, w = conv.size()
        assert h == 1, "the height of conv must be 1"
        conv = conv.squeeze(2)
        conv = conv.permute(2, 0, 1)  # [w, b, c]

        # rnn features
        output = self.rnn(conv)

If you want to change, you need to change the input shape into rnn from [w, b, c] to [length, b, c*height]. The length and height is not the origin length and height, but is the length and height after conv.

SreenijaK commented 5 years ago

ok so my images can be of any height and length right? they all dont have to be of same height and width and can be variable?

Holmeyoung commented 5 years ago

Yes, the training images will be reshaped to the same.

SreenijaK commented 5 years ago

and if i want my max length to predict to be 64 instead of 26 where exactly do i change?

Holmeyoung commented 5 years ago

Just change params.imgW to 254. Also, change train.py to

test_dataset = dataset.lmdbDataset(root=args.valroot, transform=dataset.resizeNormalize((params.imgW, params.imgH)))
SreenijaK commented 5 years ago

what is the reason we are making it to 254?

SreenijaK commented 5 years ago

When you change the height to 64 and width to 900, what's the error output. And if you want to change length 26, you need to change the net structure in crnn.py. The T length in lstm.

Where do i change for more than 26 characters in crnn.py

Holmeyoung commented 5 years ago

You need to calculate it. After conv and pool what's the image width. The image width will be T length in rnn.

SreenijaK commented 5 years ago

sorry for asking too many questions but whats the calculation that you did to get 254

Holmeyoung commented 5 years ago

It doesn’t matter. For example, a 6·6 image, after 3·3 con kernel, it will be 6-3+1=4. So it’s 4·4. And after 2·2 max pool, it will be 4/2=2. So it’s 2·2.

SreenijaK commented 5 years ago

thank you i understand your calculations but i still am confused on how did we end up at 254

Holmeyoung commented 5 years ago

Changed the net structure.

        self.conv_0 = convRelu(conv_0, 0)
        self.pool_0 = nn.MaxPool2d(2, 2)
        self.conv_1 = convRelu(conv_1, 1)
        self.pool_1 = nn.MaxPool2d(2, 2)
        self.conv_2 = convRelu(conv_2, 2, True)
        self.conv_3 = convRelu(conv_3, 3)
        self.pool_2 = nn.MaxPool2d((2, 2), (2, 1), (0, 1))
        self.conv_4 = convRelu(conv_4, 4, True)
        self.conv_5 = convRelu(conv_5, 5)
        self.pool_3 = nn.MaxPool2d((2, 2), (2, 1), (0, 1))
        self.conv_6 = convRelu(conv_6, 6, True)

        self.rnn_0 = BidirectionalLSTM(512, nh, nh)
        self.rnn_1 = BidirectionalLSTM(nh, nh, nclass)

Print the data shape

        print ('\n----------------------')
        print (input.shape)
        # conv features
        x = self.conv_0(input)
        print ('conv_0:', x.shape)
        x = self.pool_0(x)
        print ('pool_0:', x.shape)
        x = self.conv_1(x)
        print ('conv_1:', x.shape)
        x = self.pool_1(x)
        print ('pool_1:', x.shape)
        x = self.conv_2(x)
        print ('conv_2:', x.shape)
        x = self.conv_3(x)
        print ('conv_3:', x.shape)
        x = self.pool_2(x)
        print ('pool_2:', x.shape)
        x = self.conv_4(x)
        print ('conv_4:', x.shape)
        x = self.conv_5(x)
        print ('conv_5:', x.shape)
        x = self.pool_3(x)
        print ('pool_3:', x.shape)
        x = self.conv_6(x)

        print ('conv_6:', x.shape)
        b, c, h, w = x.size()
        assert h == 1, "the height of conv must be 1"
        conv = x.squeeze(2)
        conv = conv.permute(2, 0, 1)  # [w, b, c]

        # rnn features
        rnn = self.rnn_0(conv)
        print ('lstm_0:', rnn.shape)
        output = self.rnn_1(rnn)
        print ('lstm_1:', output.shape)

Result

----------------------
torch.Size([64, 1, 32, 254])
conv_0: torch.Size([64, 64, 32, 254])
pool_0: torch.Size([64, 64, 16, 127])
conv_1: torch.Size([64, 128, 16, 127])
pool_1: torch.Size([64, 128, 8, 63])
conv_2: torch.Size([64, 256, 8, 63])
conv_3: torch.Size([64, 256, 8, 63])
pool_2: torch.Size([64, 256, 4, 64])
conv_4: torch.Size([64, 512, 4, 64])
conv_5: torch.Size([64, 512, 4, 64])
pool_3: torch.Size([64, 512, 2, 65])
conv_6: torch.Size([64, 512, 1, 64])
lstm_0: torch.Size([64, 64, 256])
lstm_1: torch.Size([64, 64, 8])

As to how the net works in eatch layer, you can learn about how the kernel, padding, stride work in convolution. And because the data is not divisible, maybe there are other results exists.

SreenijaK commented 5 years ago

Ok got it thank you.