KaniyamFoundation / Pdf2Text

Project to convert PDF files to Text files using google OCR
Other
12 stars 5 forks source link

Ran on windows 10 #3

Open mokhosh opened 4 years ago

mokhosh commented 4 years ago

I cloned the project and installed all dependencies.

I had to make a couple of changes to get it to run successfully on windows:

Line 240

- input_filename = input_file.split(input_folder+"/")[1] 
+ input_filename = input_file.split(input_folder+"\\")[1]

Line 278

- os.system(command.encode("utf-8"))
+ os.system(command)
tshrinivasan commented 4 years ago

great.

can you share the procedure to install this in windows?

mokhosh commented 4 years ago

I had python3 and pip already installed. So that's step 1.

2- Download latest binaries for windows, unzip poppler and mutool to a certain folder, and install Ghostscript: popler-windows - https://blog.alivate.com.au/poppler-windows/ gs - https://www.ghostscript.com/download/gsdnld.html mutool - https://mupdf.com/downloads/index.html

3- Clone Pdf2Text and change your config.ini:

mutool = D:\app\tools\mupdf-1.16.0-windows\mutool.exe
pdfseparate = D:\app\tools\poppler-0.68.0\bin\pdfseparate.exe
pdfunite = D:\app\tools\poppler-0.68.0\bin\pdfunite.exe
gs = D:\app\tools\gs\gs9.50\bin\gswin64c.exe

Use the location where you unzipped and installed the tools instead of d:\app\tools of course.

Now you can follow the rest of installation, just skip this line:

sudo apt-get install poppler-utils mupdf-tools git python3-pip ghostscript

and you don't need sudo in windows, and you might wanna use pip instead of pip3 depending on your installation of python.