Wolf0605 / Translate_letters

2 stars 1 forks source link

Tesseract OCR 을 활용해서 text 를 추출해볼까? #1

Open Wolf0605 opened 2 years ago

Wolf0605 commented 2 years ago

How to Install OCR in tesseract

  1. GO to here https://github.com/tesseract-ocr/tesseract
  2. Find 'installing' package image
  3. Find 'windows' image
  4. And than Install what you want image
Wolf0605 commented 2 years ago

Using tesseract OCR in python

import pytesseract
import pytesseract as tess
pytesseract.pytesseract.tesseract_cmd = r'C:\Tesseract\tesseract.exe'
from PIL import Image

This is library you need

img = Image.open('Resources/eng_anime.PNG')
text = pytesseract.image_to_string(img)
print(text)

And code

Let's see how it works

eng_anime1 I want it

THE
SIMSPSONS
DID IT

But output was terrible image I tried another one

eng_anime4

THE VAST WAISTBAND
SALE

image It was pretty good. but , there are some noize

I also tried difficult one image Output image

Problem

It can read straight line string but others don't

Wolf0605 commented 2 years ago

+Also i tried kor, jpn

kor_hand1 Out put image

image

Out put image

image

Out put

세 미

식 젼 빵
차 우 스 쾌

오 늘 의 생 어 드

스 놀
오 늘 의 파 스 따

스 테 이크 코 스
29.000 원

식 전 빵
요 늘 의 생 어 드
오 능 의 수프
채 군 튼 십 10000).

JPN

jpn_menu1

Out put

【 そ ば ・ ら ど ん )
・⑤③0a

お に ぎ 0 く ⑦ ヶ ー

攣麟り薯ヲ ト #0⑨⑧ 椎 ⑰0⑧
⑤00a +⑦⑤0

Z膣天ゎが}輸“

午騎‥峨~午積 ⑦B③0 京 司 庵

は 楊 〒粒"時~午債凛時〉 _ 誉 567 q374'


Process finished with exit code 0

jpn_menu2

Out put

IE 隆 山 【eE

る x く + し セ で
< レト 談 談 E ① に し
aukuo R へ ③
ン ート キ く り a
n く m、 ) 啓
で ト ヘ ②⑨
ト み ク 賽僻靡
“ 倉
〉 MuAn せ ④
ぁ + キ ト ふ A
s セ り も ト さ L
② な Ke 子 叶 ④Sd
か p へ n を 可 滝 % き る
訣 団 と nn い e

人 囲 P へ B
aa 0 一 ベ

ー ヘ

jpn_menu3

Out put

| 珪 ⑧ エ リ ム プ ロ リ ク
国 を \ 晃 : フ ラ ン ス を ま Ne カ ッ
& る th う カ コ テ ン ル g
①emdor 電 思 で も

image

Out put

轟毛和牛ネギ塩ヵルど } 鼻毛和牛上ネギ壇カルピ
_ ⑧⑦0n 相 ①.①⑥0
Wolf0605 commented 2 years ago

Fail

Becuase It's not working if it's not sheet

So i'm going to try different ocr ( pytorch ) or use api