charlesw / tesseract

A .Net wrapper for tesseract-ocr
Apache License 2.0
2.27k stars 741 forks source link

Convert PDF to TIFF #513

Open vijtad opened 4 years ago

vijtad commented 4 years ago

Do you have any example which converts multiple PDF pages into multi-page TIFF ?

charlesw commented 4 years ago

Your best bet is probably to use a pdf lib/tool that's specifically designed for it.

On Sun, 3 May 2020, 01:06 Vijay Prakash Tadinada, notifications@github.com wrote:

Do you have any example which converts multiple PDF pages into multi-page TIFF ?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/charlesw/tesseract/issues/513, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAB7HSBH2XWJGNN24Y43IS3RPQZGRANCNFSM4MXW7MYA .

MohanVijayakumar commented 4 years ago

You can use pdfium C# wrapper to get content from PDF and directly convert to them to TIFF. I currently use below library to convert PDF to PNG and do the OCR https://github.com/GowenGit/docnet

shibaev commented 4 years ago

Look at the OCR PDF in .NET article. It describes how to OCR PDF using Docotic.Pdf and Tesseract.

Pure PDF to TIFF conversion does not relate to Tesseract at all. Tesseract is for optical recognition only.