Fine-tuned EasyOCR model with thai_g1.pth

JaidedAI / EasyOCR

Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.

https://www.jaided.ai

Apache License 2.0

24.33k stars 3.15k forks source link

Fine-tuned EasyOCR model with thai_g1.pth #762

Open kwankoravich opened 2 years ago

kwankoravich commented 2 years ago

I'm working on EasyOCR Model and I would like to fine-tune the model. I'm looking into the en_filtered_config.yaml However, I'm not sure if I would like to fine-tune Thai dataset, how to change 'lang_char' parameter.

The default is lang_char: 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz' I supposed that it should be lang_char: 'กขคฆงจฉชซฌญฎฏฐฑฒณดตถทธนบปผฝพฟภมยรฤลวศษสหฬอฮฯะ ัา ำ ิ ี ึ ื ุ ู ฺเแโใไๆ ็ ่ ้ ๊ ๋ ์ ํ๑๒๓๔๕๖๗๘๙'.

However, I got the error when I load the model. So, could you please suggest to me how to adjust en_filtered_config.yaml?

s39674 commented 2 years ago

Hi @kwankoravich ! The error probably occurs because the model expects the length of lang_char to be 52 while you are inputting a 93 characters string (Although it could be different, I need to see the error ). I suggest to fine tune the Thai dataset, you could always just go back in case something happen.

SarmSKunatham commented 2 years ago

Hi @kwankoravich , From what I found in the source code, I use lang_char = abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZกขคฆงจฉชซฌญฎฏฐฑฒณดตถทธนบปผฝพฟภมยรลวศษสหฬอฮฤเแโใไะาุูิีืึั่้๊๋็์ำํฺฯๆ and it's work for me in order to do the fine tuning.

darwinharianto commented 1 year ago

@SarmSKunatham how do you find that lang_char value inside the source code?

Edit: lang_char can be found at easyocr/config.py just find your language, and you can see the chars

khawar-islam commented 1 year ago

Dear @kwankoravich Did you find a solution to your problem? I am also doing fine-tunning on Korean language

Dear @darwinharianto Did you successfully fine-tune the model?

darwinharianto commented 1 year ago

yes, you just need to prepare the data and do it based on https://github.com/JaidedAI/EasyOCR/blob/master/trainer/trainer.ipynb

khawar-islam commented 1 year ago

@darwinharianto I am confused about the pre-trained weights link. Where I can give the link to the pre-trained weight and where i can download the prestrained weights for Korean, and English recognition? If you check the below link there are no pre-trained weights.

https://github.com/JaidedAI/EasyOCR/blob/master/trainer/config_files/en_filtered_config.yaml

darwinharianto commented 1 year ago

you just need to use the default one

import easyocr
easyocr.Reader(['kr']) # is it kr for korean?

This will automatically download your pretrained, which is saved at ~/.EasyOCR/model if you are using ubuntu. Then change that yaml file settings

saved_model: 'path to the downloaded pretrained(the one inside easyocr's model folder)'

khawar-islam commented 1 year ago

@darwinharianto yes, I know but this KR model is not robust for Korean handwritten recognition. Therefore, I want to fine-tune the KR model on KR handwritten samples (2M)

darwinharianto commented 1 year ago

I think I don't understand what are you trying to achieve. When I try to fine tune a model, I load the previous model, then run training using my custom dataset on the model. The resulting model would be my fine tuned model.

In your case, you don't want to use the previous model, but want to fine tune it?

khawar-islam commented 1 year ago

@darwinharianto just let me know where we can download and pass the link of the previous model to fine-tune on new data.

darwinharianto commented 1 year ago

just let me know where we can download and pass the link of the previous model to fine-tune on new data.

I don't know where to download it, because I let easyocr to download it from me. The downloaded model is inside ~/.EasyOCR/model

khawar-islam commented 1 year ago

Yes, I find the EasyOCR model but where I can give a model link for fine-tuning?

darwinharianto commented 1 year ago

https://github.com/JaidedAI/EasyOCR/issues/762#issuecomment-1502819379

As you can see from my previous comment,

Then change that yaml file settings

saved_model: 'path to the downloaded pretrained(the one inside easyocr's model folder)'