LeoFCardoso / pdf2pdfocr

A free tool to OCR a PDF and add a text "layer" in the original file, making a searchable PDF. Use only open source tools. Please tip!
Apache License 2.0
273 stars 34 forks source link

PIL.Image.DecompressionBombError: Image size (235978454 pixels) exceeds limit of 178956970 pixels, could be decompression bomb DOS attack. #41

Closed yatrik-cloud closed 1 year ago

yatrik-cloud commented 1 year ago

While applying OCR to a PDF, using the docker image of the repo "leofcardoso/pdf2pdfocr:latest", this error occurred:

[2023-09-05 10:35:58.939733] [LOG] Welcome to pdf2pdfocr version 1.12.0 marapurense - https://github.com/LeoFCardoso/pdf2pdfocr [2023-09-05 10:35:58.959460] [LOG] Input file /home/docker/Dummy_IS.pdf: type is application/pdf [2023-09-05 10:35:59.047502] [LOG] Converting input file to images... [2023-09-05 10:36:38.577186] [LOG] Checking blank pages multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/usr/lib/python3.10/multiprocessing/pool.py", line 125, in worker result = (True, func(*args, **kwds)) File "/usr/lib/python3.10/multiprocessing/pool.py", line 51, in starmapstar return list(itertools.starmap(args[0], args[1])) File "/usr/local/bin/pdf2pdfocr.py", line 249, in do_check_img_colors_size im = Image.open(param_image_file) File "/usr/local/lib/python3.10/dist-packages/PIL/Image.py", line 3172, in open im = _open_core(fp, filename, prefix, formats) File "/usr/local/lib/python3.10/dist-packages/PIL/Image.py", line 3159, in _open_core _decompression_bomb_check(im.size) File "/usr/local/lib/python3.10/dist-packages/PIL/Image.py", line 3068, in _decompression_bomb_check raise DecompressionBombError( PIL.Image.DecompressionBombError: Image size (235978454 pixels) exceeds limit of 178956970 pixels, could be decompression bomb DOS attack. """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/usr/local/bin/pdf2pdfocr.py", line 1530, in pdf2ocr.ocr() File "/usr/local/bin/pdf2pdfocr.py", line 712, in ocr self.check_blank_pages(image_file_list) File "/usr/local/bin/pdf2pdfocr.py", line 1010, in check_blank_pages blank_map_values = colors_size_pool_map.get() File "/usr/lib/python3.10/multiprocessing/pool.py", line 774, in get raise self._value PIL.Image.DecompressionBombError: Image size (235978454 pixels) exceeds limit of 178956970 pixels, could be decompression bomb DOS attack.

LeoFCardoso commented 1 year ago

Hi, thank you for the post. Can you please share your source file? This bug may be avoided trying lower resolution in images. Please try "-r 200" flag and lets see what happens.

yatrik-cloud commented 1 year ago

Hi, thank you for the post. Can you please share your source file? This bug may be avoided trying lower resolution in images. Please try "-r 200" flag and lets see what happens.

Yes, Great! "-r 200" is working Thank you so much for your quick response.