LJSthu / Python-Remove-Watermark

A simple program to remove the watermark from a PDF file.
89 stars 36 forks source link

Python-Remove-Watermark requires poppler and Unable to get page count #8

Open metapea opened 3 months ago

metapea commented 3 months ago

I tried ben0i0d version i got two errors:

Traceback (most recent call last):
  File "C:\Users\X\AppData\Local\Programs\Python\Python38\lib\site-packages\pdf2image\pdf2image.py", line 581, in pdfinfo_from_path
    proc = Popen(command, env=env, stdout=PIPE, stderr=PIPE)
  File "C:\Users\X\AppData\Local\Programs\Python\Python38\lib\subprocess.py", line 858, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "C:\Users\X\AppData\Local\Programs\Python\Python38\lib\subprocess.py", line 1327, in _execute_child
    hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
FileNotFoundError: [WinError 2] The system cannot find the file specified

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "watermark.py", line 51, in <module>
    imgs = np.array(convert_from_path(args.source))
  File "C:\Users\X\AppData\Local\Programs\Python\Python38\lib\site-packages\pdf2image\pdf2image.py", line 127, in convert_from_path
    page_count = pdfinfo_from_path(
  File "C:\Users\X\AppData\Local\Programs\Python\Python38\lib\site-packages\pdf2image\pdf2image.py", line 607, in pdfinfo_from_path
    raise PDFInfoNotInstalledError(
pdf2image.exceptions.PDFInfoNotInstalledError: Unable to get page count. Is poppler installed and in PATH? 

And:

Traceback (most recent call last):
  File "watermark.py", line 51, in <module>
    imgs = np.array(convert_from_path(args.source))
  File "C:\Users\X\AppData\Local\Programs\Python\Python38\lib\site-packages\pdf2image\pdf2image.py", line 127, in convert_from_path
    page_count = pdfinfo_from_path(
  File "C:\Users\X\AppData\Local\Programs\Python\Python38\lib\site-packages\pdf2image\pdf2image.py", line 607, in pdfinfo_from_path
    raise PDFInfoNotInstalledError(
pdf2image.exceptions.PDFInfoNotInstalledError: Unable to get page count. Is poppler installed and in PATH?
PS D:\code\Python-Remove-Watermark-master> python watermark.py --source  **_(pdf in the same folder as the .py script)_**.pdf --target out
Traceback (most recent call last):
  File "C:\Users\X\AppData\Local\Programs\Python\Python38\lib\site-packages\pdf2image\pdf2image.py", line 602, in pdfinfo_from_path
    raise ValueError
ValueError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "watermark.py", line 51, in <module>
    imgs = np.array(convert_from_path(args.source))
  File "C:\Users\X\AppData\Local\Programs\Python\Python38\lib\site-packages\pdf2image\pdf2image.py", line 127, in convert_from_path
    page_count = pdfinfo_from_path(
  File "C:\Users\X\AppData\Local\Programs\Python\Python38\lib\site-packages\pdf2image\pdf2image.py", line 611, in pdfinfo_from_path
    raise PDFPageCountError(
pdf2image.exceptions.PDFPageCountError: Unable to get page count.

The last error is in this version of Python-Remove-Watermark as well

ben0i0d commented 1 month ago

Sorry, I've been too busy lately, I've already revised it. look at #7 I've remove poppler. @metapea

ben0i0d commented 1 month ago

The reason for this is because Poppler needs to manually configure the environment variables on Windows. I reproduced this error in a bookworm-slim container, in order to minimize the difficulty of use, I rewrote it with pymupdf, it seems that everything is working fine, and efficient enough, if you still have problems, please @me, or CC me to the email.