cisocrgroup / ocrd_cis

OCR-D python tools
MIT License
33 stars 12 forks source link

Is there any easiler way to use this (OCR post-correction tool ) in python likewise we can easily use tesseract-OCR in python ? #53

Open NavpreetDevpuri opened 4 years ago

NavpreetDevpuri commented 4 years ago

I want a simple way to use this aswome library in python. likewise in python we can use tesseract-OCR see here how easy it is to use.

If it is possible to use it in python then we can also use it on windows. i am using windows 10 64bit

bertsky commented 4 years ago

@NavpreetDevpuri What do you mean by simple way?

This repo contains an OCR post-correction tool along with a much improved version of Ocropy 1 and ocrolib, but only for OCR-D – as the description/documentation says.

If you want non-OCR-D CLIs, you'll have to use the ocropus-* tools from old Ocropy 1 (which is Python 2 only).

For Tesseract API in Python, I recommend tesserocr instead of pytesseract.

I don't see how your OS choice is relevant here.

Can we close this?

NavpreetDevpuri commented 4 years ago

thanks for your reply. i want to know that is there any way to use this OCR post-correction tool in python likewise we can easily use tesseract-OCR (OCR tool) in python ? it seems like i need to setup Docker as mentioned user_guide i want to use it in python without Docker likewise tesseract.

i want to use methods mentioned at workflows in a easiler way something like

import ocrd
import cv2 

config = {
    "ocrd-olena-binarize": {"impl": "sauvola"},
    "ocrd-anybaseocr-crop": None,
    "ocrd-olena-binarize": {"impl": "kim"},
    "ocrd-cis-ocropy-denoise": {"level-of-operation":"page"},
    "ocrd-tesserocr-deskew": {"operation_level":"page"},
    "ocrd-tesserocr-segment-region": None,
    "ocrd-segment-repair": {"plausibilize": True},
    "ocrd-cis-ocropy-deskew": {"level-of-operation":"region"},
    "ocrd-cis-ocropy-clip": {"level-of-operation":"region"},
    "ocrd-tesserocr-segment-line": None,
    "ocrd-segment-repair": {"sanitize": True},
    "ocrd-cis-ocropy-dewarp": None,
    "ocrd-calamari-recognize": {"checkpoint":"/path/to/models/*.ckpt.json"}
}

img = cv2.read("someimage.jpg")

# Doing the post-correction magic
processed_img = ocrd.process(img, config)

# Now i can use pytesseract to get text from processed_img
text = pytesseract.image_to_string(processed_img)
print(text)

This tool is awsome but it should be easy to use.