Holmeyoung / crnn-pytorch

Pytorch implementation of CRNN (CNN + RNN + CTCLoss) for all language OCR.
MIT License
378 stars 105 forks source link

Output indicates "PAD" char for all columns #59

Open SriramPingali opened 4 years ago

SriramPingali commented 4 years ago

Thanks for the Code!

I was using the model for a custom dataset (IAM Dataset for Hand written texts), and therefore wrote a custom Dataset class for train_loader which gives a 32 128 3 (H W D) image and a string. I also used a custom function for doing one hot encoding for the labels.

Although I notice that the output from the CRNN model is always a tensor full of zeroes (the PAD character from my letter dictionary)

Please Help me out here! Thanks in advance!! `class dataset(Dataset):

def __init__(self, image_root, label_root, img_x, img_y):
    """Init function should not do any heavy lifting, but
        must initialize how many items are available in this data set.
    """
    self.images_path = image_root
    self.labels_path = label_root
    self.data_len = 0
    self.images = []
    self.labels = open(self.labels_path, "r").readlines()
    self.transform = transforms.Compose([
        transforms.Resize((img_x, img_y)),  
        transforms.ToTensor(),
        transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

    for root, dirs, files in os.walk(self.images_path):
        for file in files:
            if file.endswith('.png'):
                self.data_len += 1
                temp = file.split("-")
                self.images.append(self.images_path + temp[0] + '/' + temp[0] + "-" + temp[1] + "/" + file)

def __len__(self):
    """return number of points in our dataset"""
    return(self.data_len)

def __getitem__(self, idx):
    """ Here we have to return the item requested by `idx`
        The PyTorch DataLoader class will use this method to make an iterable for
        our training or validation loop.
    """
    img = self.images[idx]
    label = self.labels[idx]
    img = Image.open(img)
    img = img.convert('RGB')
    img = self.transform(img)
    return(img, label[:-1])`

`def word_rep(word, letter2index, max_out_chars, device = 'cpu'):

rep = torch.zeros(max_out_chars).to(device)
if max_out_chars < len(word) + 1:
    for i in range(max_out_chars):
        pos = letter2index[word[i]]
        rep[i] = pos
    return(rep ,max_out_chars)

for letter_index, letter in enumerate(word):
    pos = letter2index[letter]
    rep[letter_index] = pos

pad_pos = letter2index[pad_char]
rep[letter_index+1] = pad_pos
return(rep, len(word))`
Holmeyoung commented 4 years ago

Hi, i have already writen a demo.py for you to test and train.py for you to train.

SriramPingali commented 4 years ago

Hey @holmeyoung.. I am using the train.py you've written. Although, I wanted to train the model on a custom dataset. Therfore I did the above changes for the dataset class. It is working as expected but the training isn't giving fruitful results.

Holmeyoung commented 4 years ago

Hi, i know you are training it on a custom dataset, but you'd better make you data into lmdb and train with lmdb. My code is based on it. There is a create_dataset.py in tool folder.