JaidedAI / EasyOCR

Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
https://www.jaided.ai
Apache License 2.0
22.83k stars 3.01k forks source link

How to replace the best_accuracy.pth with arabic.pth ? #892

Open MohieEldinMuhammad opened 1 year ago

MohieEldinMuhammad commented 1 year ago

I finetuned the Arabic model on my dataset : image and this is my config file:

number: '1234567890١٢٣٤٥٦٧٨٩٠'
symbol: ''
character:  "0123456789!\"#$%&'()*+,-./:;<=>?@[\\]^_`{|}~ abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ٠١٢٣٤٥٦٧٨٩«»؟،؛ءآأؤإئااًبةتثجحخدذرزسشصضطظعغفقكلمنهوىيٱٹپچڈڑژکڭگںھۀہۂۃۆۇۈۋیېےۓە"
experiment_name: 'arabic' 
train_data: 'all_data/en_train_filtered'
valid_data: 'all_data/en_val'
manualSeed: 1111
workers: 6
batch_size: 2 
num_iter: 15
valInterval: 5
saved_model: '/content/drive/MyDrive/OCR Data/arabic.pth' 
FT: False
optim: False # default is Adadelta
lr: 1.
beta1: 0.9
rho: 0.95
eps: 0.00000001
grad_clip: 5
#Data processing
select_data: 'all_data/en_train_filtered' # this is dataset folder in train_data
batch_ratio: '1' 
total_data_usage_ratio: 1.0
batch_max_length: 34 
imgH: 64
imgW: 600
rgb: False
contrast_adjust: False
sensitive: True
PAD: True
contrast_adjust: 0.0
data_filtering_off: False
# Model Architecture
Transformation: 'None'
FeatureExtraction: 'ResNet'
SequenceModeling: 'BiLSTM'
Prediction: 'CTC'
num_fiducial: 20
input_channel: 1
output_channel: 512
hidden_size: 512
decode: 'greedy'
new_prediction: False
freeze_FeatureFxtraction: False
freeze_SequenceModeling: False

I go to the originally downloaded model here and replace it with my model : image

when i try to initialize the reader again i got this error: image

What is the correct way to replace the original model with my fine-tuned model?

amroghoneim commented 1 year ago

You need to remove the following lines from easyocr.py

# elif calculate_md5(model_path) != model['md5sum']:
#     if not self.download_enabled:
#         raise FileNotFoundError("MD5 mismatch for %s and downloads disabled" % model_path)
#     LOGGER.warning(corrupt_msg)
#     os.remove(model_path)
#     LOGGER.warning('Re-downloading the recognition model, please wait. '
#                    'This may take several minutes depending upon your network connection.')
#     download_and_unzip(model['url'], model['filename'], self.model_storage_directory, verbose)
#     assert calculate_md5(model_path) == model['md5sum'], corrupt_msg
#     LOGGER.info('Download complete')

Its somewhere around line 180.

MohieEldinMuhammad commented 1 year ago

it worked with me thanks a lot

but as you mentioned the results is far worse than the results i got while training

MohieEldinMuhammad commented 1 year ago

Maybe these lines we commented are the reason?

amroghoneim commented 1 year ago

These lines just check if this model is equivalent( same weights, architecture everything) to the what have in their storage. Since we retrained the model, the weights will change, so we just remove these lines that check this. I don't believe its related to the accuracy of the model. I am thinking either there is a problem with the validation set (data leakage or something else) or there is an issue with the dictionary of characters and their ordering or something else like the image dimensions etc.

Didn't confirm this yet.

MohieEldinMuhammad commented 1 year ago

@amroghoneim it's not a leakage i already overfitted the model on this image and got 100% accuracy on it while training.

MohieEldinMuhammad commented 1 year ago

see here this is the same issue #336 the author's reply was to follow these instructions https://github.com/JaidedAI/EasyOCR/blob/master/custom_model.md the problem here is the cutom_model.py which is describing recognition network architecture, I don't know how to create this file!

MohieEldinMuhammad commented 1 year ago

if we got this file we will be able to use the model like this easyocr.Reader(['en'], recog_network='custom_example')

amroghoneim commented 1 year ago

Interesting. So checking out the zip file mentioned, custom_example.pth would be best_accuracy.pth custom_example.py would be model.py and custom_example.yaml would be the config file used for training using only the parameters needed for the model.

you can find the model architecture (model.py) either in the easyocr repo itself, or from the trainer directory we used. we then just add these files together in some directory and use the Reader with the path to these files.

Lets try this out and see what happens.

amroghoneim commented 1 year ago
user_network_directory (bool, default = None) - Path to user-defined recognition network. If not specified, models will be read from MODULE_PATH + '/user_network' (~/.EasyOCR/user_network).

recog_network (string, default = 'standard') - Instead of standard mode, you can choose your own recognition network --- Tutorial to be written about this.

from: https://jaided.ai/easyocr/documentation/

MohieEldinMuhammad commented 1 year ago

I tried and it worked without errors but with the same bad predictions like replacing the .pth model. Here is the .yml file I used:

network_params:
  hidden_size: 512
  input_channel: 1
  output_channel: 512
  hidden_size: 512

imgH: 64
imgW: 600

lang_list:
         - 'en'
character_list: "0123456789!\"#$%&'()*+,-./:;<=>?@[\\]^_`{|}~ abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ٠١٢٣٤٥٦٧٨٩«»؟،؛ءآأؤإئااًبةتثجحخدذرزسشصضطظعغفقكلمنهوىيًٌٍَُِّْٰٓٔٱٹپچڈڑژکڭگںھۀہۂۃۆۇۈۋیېےۓە"
number: '1234567890١٢٣٤٥٦٧٨٩٠'

The .py file:

import torch
import torch.nn as nn
import torch.nn.init as init
import torchvision
from torchvision import models
from collections import namedtuple
from packaging import version

def init_weights(modules):
    for m in modules:
        if isinstance(m, nn.Conv2d):
            init.xavier_uniform_(m.weight.data)
            if m.bias is not None:
                m.bias.data.zero_()
        elif isinstance(m, nn.BatchNorm2d):
            m.weight.data.fill_(1)
            m.bias.data.zero_()
        elif isinstance(m, nn.Linear):
            m.weight.data.normal_(0, 0.01)
            m.bias.data.zero_()

class vgg16_bn(torch.nn.Module):
    def __init__(self, pretrained=True, freeze=True):
        super(vgg16_bn, self).__init__()
        if version.parse(torchvision.__version__) >= version.parse('0.13'):
            vgg_pretrained_features = models.vgg16_bn(
                weights=models.VGG16_BN_Weights.DEFAULT if pretrained else None
            ).features
        else: #torchvision.__version__ < 0.13
            models.vgg.model_urls['vgg16_bn'] = models.vgg.model_urls['vgg16_bn'].replace('https://', 'http://')
            vgg_pretrained_features = models.vgg16_bn(pretrained=pretrained).features

        self.slice1 = torch.nn.Sequential()
        self.slice2 = torch.nn.Sequential()
        self.slice3 = torch.nn.Sequential()
        self.slice4 = torch.nn.Sequential()
        self.slice5 = torch.nn.Sequential()
        for x in range(12):         # conv2_2
            self.slice1.add_module(str(x), vgg_pretrained_features[x])
        for x in range(12, 19):         # conv3_3
            self.slice2.add_module(str(x), vgg_pretrained_features[x])
        for x in range(19, 29):         # conv4_3
            self.slice3.add_module(str(x), vgg_pretrained_features[x])
        for x in range(29, 39):         # conv5_3
            self.slice4.add_module(str(x), vgg_pretrained_features[x])

        # fc6, fc7 without atrous conv
        self.slice5 = torch.nn.Sequential(
                nn.MaxPool2d(kernel_size=3, stride=1, padding=1),
                nn.Conv2d(512, 1024, kernel_size=3, padding=6, dilation=6),
                nn.Conv2d(1024, 1024, kernel_size=1)
        )

        if not pretrained:
            init_weights(self.slice1.modules())
            init_weights(self.slice2.modules())
            init_weights(self.slice3.modules())
            init_weights(self.slice4.modules())

        init_weights(self.slice5.modules())        # no pretrained model for fc6 and fc7

        if freeze:
            for param in self.slice1.parameters():      # only first conv
                param.requires_grad= False

    def forward(self, X):
        h = self.slice1(X)
        h_relu2_2 = h
        h = self.slice2(h)
        h_relu3_2 = h
        h = self.slice3(h)
        h_relu4_3 = h
        h = self.slice4(h)
        h_relu5_3 = h
        h = self.slice5(h)
        h_fc7 = h
        vgg_outputs = namedtuple("VggOutputs", ['fc7', 'relu5_3', 'relu4_3', 'relu3_2', 'relu2_2'])
        out = vgg_outputs(h_fc7, h_relu5_3, h_relu4_3, h_relu3_2, h_relu2_2)
        return out

class BidirectionalLSTM(nn.Module):

    def __init__(self, input_size, hidden_size, output_size):
        super(BidirectionalLSTM, self).__init__()
        self.rnn = nn.LSTM(input_size, hidden_size, bidirectional=True, batch_first=True)
        self.linear = nn.Linear(hidden_size * 2, output_size)

    def forward(self, input):
        """
        input : visual feature [batch_size x T x input_size]
        output : contextual feature [batch_size x T x output_size]
        """
        try: # multi gpu needs this
            self.rnn.flatten_parameters()
        except: # quantization doesn't work with this 
            pass
        recurrent, _ = self.rnn(input)  # batch_size x T x input_size -> batch_size x T x (2*hidden_size)
        output = self.linear(recurrent)  # batch_size x T x output_size
        return output

class VGG_FeatureExtractor(nn.Module):

    def __init__(self, input_channel, output_channel=512):
        super(VGG_FeatureExtractor, self).__init__()
        self.output_channel = [int(output_channel / 8), int(output_channel / 4),
                               int(output_channel / 2), output_channel]
        self.ConvNet = nn.Sequential(
            nn.Conv2d(input_channel, self.output_channel[0], 3, 1, 1), nn.ReLU(True),
            nn.MaxPool2d(2, 2),
            nn.Conv2d(self.output_channel[0], self.output_channel[1], 3, 1, 1), nn.ReLU(True),
            nn.MaxPool2d(2, 2),
            nn.Conv2d(self.output_channel[1], self.output_channel[2], 3, 1, 1), nn.ReLU(True),
            nn.Conv2d(self.output_channel[2], self.output_channel[2], 3, 1, 1), nn.ReLU(True),
            nn.MaxPool2d((2, 1), (2, 1)),
            nn.Conv2d(self.output_channel[2], self.output_channel[3], 3, 1, 1, bias=False),
            nn.BatchNorm2d(self.output_channel[3]), nn.ReLU(True),
            nn.Conv2d(self.output_channel[3], self.output_channel[3], 3, 1, 1, bias=False),
            nn.BatchNorm2d(self.output_channel[3]), nn.ReLU(True),
            nn.MaxPool2d((2, 1), (2, 1)),
            nn.Conv2d(self.output_channel[3], self.output_channel[3], 2, 1, 0), nn.ReLU(True))

    def forward(self, input):
        return self.ConvNet(input)

class ResNet_FeatureExtractor(nn.Module):
    """ FeatureExtractor of FAN (http://openaccess.thecvf.com/content_ICCV_2017/papers/Cheng_Focusing_Attention_Towards_ICCV_2017_paper.pdf) """

    def __init__(self, input_channel, output_channel=512):
        super(ResNet_FeatureExtractor, self).__init__()
        self.ConvNet = ResNet(input_channel, output_channel, BasicBlock, [1, 2, 5, 3])

    def forward(self, input):
        return self.ConvNet(input)

class BasicBlock(nn.Module):
    expansion = 1

    def __init__(self, inplanes, planes, stride=1, downsample=None):
        super(BasicBlock, self).__init__()
        self.conv1 = self._conv3x3(inplanes, planes)
        self.bn1 = nn.BatchNorm2d(planes)
        self.conv2 = self._conv3x3(planes, planes)
        self.bn2 = nn.BatchNorm2d(planes)
        self.relu = nn.ReLU(inplace=True)
        self.downsample = downsample
        self.stride = stride

    def _conv3x3(self, in_planes, out_planes, stride=1):
        "3x3 convolution with padding"
        return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride,
                         padding=1, bias=False)

    def forward(self, x):
        residual = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)

        if self.downsample is not None:
            residual = self.downsample(x)
        out += residual
        out = self.relu(out)

        return out

class ResNet(nn.Module):

    def __init__(self, input_channel, output_channel, block, layers):
        super(ResNet, self).__init__()

        self.output_channel_block = [int(output_channel / 4), int(output_channel / 2), output_channel, output_channel]

        self.inplanes = int(output_channel / 8)
        self.conv0_1 = nn.Conv2d(input_channel, int(output_channel / 16),
                                 kernel_size=3, stride=1, padding=1, bias=False)
        self.bn0_1 = nn.BatchNorm2d(int(output_channel / 16))
        self.conv0_2 = nn.Conv2d(int(output_channel / 16), self.inplanes,
                                 kernel_size=3, stride=1, padding=1, bias=False)
        self.bn0_2 = nn.BatchNorm2d(self.inplanes)
        self.relu = nn.ReLU(inplace=True)

        self.maxpool1 = nn.MaxPool2d(kernel_size=2, stride=2, padding=0)
        self.layer1 = self._make_layer(block, self.output_channel_block[0], layers[0])
        self.conv1 = nn.Conv2d(self.output_channel_block[0], self.output_channel_block[
                               0], kernel_size=3, stride=1, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(self.output_channel_block[0])

        self.maxpool2 = nn.MaxPool2d(kernel_size=2, stride=2, padding=0)
        self.layer2 = self._make_layer(block, self.output_channel_block[1], layers[1], stride=1)
        self.conv2 = nn.Conv2d(self.output_channel_block[1], self.output_channel_block[
                               1], kernel_size=3, stride=1, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(self.output_channel_block[1])

        self.maxpool3 = nn.MaxPool2d(kernel_size=2, stride=(2, 1), padding=(0, 1))
        self.layer3 = self._make_layer(block, self.output_channel_block[2], layers[2], stride=1)
        self.conv3 = nn.Conv2d(self.output_channel_block[2], self.output_channel_block[
                               2], kernel_size=3, stride=1, padding=1, bias=False)
        self.bn3 = nn.BatchNorm2d(self.output_channel_block[2])

        self.layer4 = self._make_layer(block, self.output_channel_block[3], layers[3], stride=1)
        self.conv4_1 = nn.Conv2d(self.output_channel_block[3], self.output_channel_block[
                                 3], kernel_size=2, stride=(2, 1), padding=(0, 1), bias=False)
        self.bn4_1 = nn.BatchNorm2d(self.output_channel_block[3])
        self.conv4_2 = nn.Conv2d(self.output_channel_block[3], self.output_channel_block[
                                 3], kernel_size=2, stride=1, padding=0, bias=False)
        self.bn4_2 = nn.BatchNorm2d(self.output_channel_block[3])

    def _make_layer(self, block, planes, blocks, stride=1):
        downsample = None
        if stride != 1 or self.inplanes != planes * block.expansion:
            downsample = nn.Sequential(
                nn.Conv2d(self.inplanes, planes * block.expansion,
                          kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(planes * block.expansion),
            )

        layers = []
        layers.append(block(self.inplanes, planes, stride, downsample))
        self.inplanes = planes * block.expansion
        for i in range(1, blocks):
            layers.append(block(self.inplanes, planes))

        return nn.Sequential(*layers)

    def forward(self, x):
        x = self.conv0_1(x)
        x = self.bn0_1(x)
        x = self.relu(x)
        x = self.conv0_2(x)
        x = self.bn0_2(x)
        x = self.relu(x)

        x = self.maxpool1(x)
        x = self.layer1(x)
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)

        x = self.maxpool2(x)
        x = self.layer2(x)
        x = self.conv2(x)
        x = self.bn2(x)
        x = self.relu(x)

        x = self.maxpool3(x)
        x = self.layer3(x)
        x = self.conv3(x)
        x = self.bn3(x)
        x = self.relu(x)

        x = self.layer4(x)
        x = self.conv4_1(x)
        x = self.bn4_1(x)
        x = self.relu(x)
        x = self.conv4_2(x)
        x = self.bn4_2(x)
        x = self.relu(x)

        return x

class Model(nn.Module):

    def __init__(self, input_channel, output_channel, hidden_size, num_class):
        super(Model, self).__init__()
        """ FeatureExtraction """
        self.FeatureExtraction = ResNet_FeatureExtractor(input_channel, output_channel)
        self.FeatureExtraction_output = output_channel  # int(imgH/16-1) * 512
        self.AdaptiveAvgPool = nn.AdaptiveAvgPool2d((None, 1))  # Transform final (imgH/16-1) -> 1

        """ Sequence modeling"""
        self.SequenceModeling = nn.Sequential(
            BidirectionalLSTM(self.FeatureExtraction_output, hidden_size, hidden_size),
            BidirectionalLSTM(hidden_size, hidden_size, hidden_size))
        self.SequenceModeling_output = hidden_size

        """ Prediction """
        self.Prediction = nn.Linear(self.SequenceModeling_output, num_class)

    def forward(self, input, text):
        """ Feature extraction stage """
        visual_feature = self.FeatureExtraction(input)
        visual_feature = self.AdaptiveAvgPool(visual_feature.permute(0, 3, 1, 2))  # [b, c, h, w] -> [b, w, c, h]
        visual_feature = visual_feature.squeeze(3)

        """ Sequence modeling stage """
        contextual_feature = self.SequenceModeling(visual_feature)

        """ Prediction stage """
        prediction = self.Prediction(contextual_feature.contiguous())

        return prediction
amroghoneim commented 1 year ago

Facing the same issue too still.

amroghoneim commented 1 year ago

@rkcosmos can you please check this out and give us any advice?

MohieEldinMuhammad commented 1 year ago

@rkcosmos

ftmasadi commented 1 year ago

@rkcosmos

khawar-islam commented 1 year ago

@amroghoneim @MohieEldinMuhammad I am fine tunning korean model downloaded from EasyOCR website provides a path of a pre-trained model into .yaml file saved_model: '/media/cvpr/CM_24/EasyOCR/trainer/korean.pth'. Unfortunately, I am getting a dictionary error. I never changed lang_char: parameter. Any suggestion on how to fix it?

Traceback:

point, the shape in current model is torch.Size([256, 512]).
    size mismatch for module.SequenceModeling.1.linear.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
    size mismatch for module.Prediction.weight: copying a param with shape torch.Size([1568, 512]) from checkpoint, the shape in current model is torch.Size([1053, 256]).
    size mismatch for module.Prediction.bias: copying a param with shape torch.Size([1568]) from checkpoint, the shape in current model is torch.Size([1053]).
masoudMZB commented 1 year ago

@khawar-islam I think you should change model parameters in config.file

maybe these params: network_params: hidden_size: input_channel: output_channel: hidden_size: imgH: imgW:

khawar-islam commented 1 year ago

@masoudMZB size mismatch error occurs when there is some problem in the language dictionary. I changed hidden_size: 512 output_channel:512 still the problem is same

masoudMZB commented 1 year ago

So Try to check easyocr default lang char dictionary. import easyocr and then print characters for pre-trained model I solved this problem in this way

khawar-islam commented 1 year ago

Dear @masoudMZB @amroghoneim @MohieEldinMuhammad What is the impact of new_prediction: False and new_prediction: True while training with a pre-trained model?

cacophinix commented 2 months ago

Any luck using fine tuned model in getting better result

khawar-islam commented 2 months ago

@cacophinix In my case i fine-tuned on korean language and results are not good and performance is worst

cacophinix commented 2 months ago

@khawar-islam after fine tuning the model I get good result when I run the Demo.py code but when create the .yaml and .py files and place it in the user_network and model folder I get very bad result, what am I missing

Meriem-DAHMANI commented 2 weeks ago

@khawar-islam did u fix the issue of size mismatch ?