Problem on Ubuntu 22.04: `'NoneType' object has no attribute 'producer'`

tinloaf commented 1 year ago

I'm on pdfCropMargins version 1.1.12, with these dependency versions:

> pip install -U pdfCropMargins
Defaulting to user installation because normal site-packages is not writeable
Requirement already satisfied: pdfCropMargins in /home/lba/.local/lib/python3.10/site-packages (1.1.12)
Requirement already satisfied: pillow>=9.3.0 in /home/lba/.local/lib/python3.10/site-packages (from pdfCropMargins) (9.4.0)
Requirement already satisfied: wheel in /usr/lib/python3/dist-packages (from pdfCropMargins) (0.37.1)
Requirement already satisfied: PyPDF2<3.0.0,>=2.11.0 in /home/lba/.local/lib/python3.10/site-packages (from pdfCropMargins) (2.12.1)
Requirement already satisfied: PySimpleGUI>=4.40.0 in /home/lba/.local/lib/python3.10/site-packages (from pdfCropMargins) (4.60.4)
Requirement already satisfied: PyMuPDF>=1.20.0 in /home/lba/.local/lib/python3.10/site-packages (from pdfCropMargins) (1.21.1)

When I try to run it, I see this error:

> pdf-crop-margins /tmp/in.pdf -o /tmp/foo.pdf

Caught an unexpected exception in the pdfCropMargins program.
Unexpected error:  <class 'AttributeError'>
Error message   :  'NoneType' object has no attribute 'producer'

  File "/home/lba/.local/lib/python3.10/site-packages/pdfCropMargins/pdfCropMargins.py", line 59, in main
    output_doc_pathname, exit_code, stdout_str, stderr_str = crop()
  File "/home/lba/.local/lib/python3.10/site-packages/pdfCropMargins/pdfCropMargins.py", line 173, in crop
    output_doc_pathname = main_crop(argv_list)
  File "/home/lba/.local/lib/python3.10/site-packages/pdfCropMargins/main_pdfCropMargins.py", line 1574, in main_crop
    bounding_box_list, delta_page_nums = process_pdf_file(input_doc_pathname,
  File "/home/lba/.local/lib/python3.10/site-packages/pdfCropMargins/main_pdfCropMargins.py", line 1336, in process_pdf_file
    metadata_info.producer)

pdf-crop-margins --version seems to be about the only thing I can run that does not raise this error.

Thanks for pdfCropMargins and please let me know if there is any more info you need.

abarker commented 1 year ago

It would help to see the output of the command with the -v verbose option enabled. The only code path I see which can lead to this error is when the original document has unreadable metadata. From that I'd assume that this only happens on some files. Is that right? It would also help to have an example document that fails with this exception.

abarker commented 1 year ago

The original file might be corrupt and have unreadable metadata, in which case the --gsFix option could be tried on it. In any case, that exception should not be raised and I have pushed out version 1.1.13 of pdfCropMargins to fix at least that part.

tinloaf commented 1 year ago

Hi @abarker , thanks for getting back to me. This is the output of pdf-crop-margins -v:

> pdf-crop-margins -v /tmp/in.pdf -o /tmp/out.pdf

Processing the PDF with pdfCropMargins (version 1.1.12)...
Python version: ('3', '10', '8')
System type: Linux

The input document's filename is:
    /tmp/in.pdf

The output document's filename will be:
    /tmp/out.pdf

The absolute pre-crops to be applied to each margin, in units of bp, are:
    [0.0, 0.0, 0.0, 0.0]

The percentages of margins to retain are:
    [10.0, 10.0, 10.0, 10.0]

The absolute offsets to be applied to each margin, in units of bp, are:
    [0.0, 0.0, 0.0, 0.0]

The uniform order statistics to apply to each margin, in units of bp, are:
    []

For the full page size, using values from the PDF box
specified by the intersection of these boxes: ['m', 'c']

The input document has 1 pages.

No readable metadata in the document.

Caught an unexpected exception in the pdfCropMargins program.
Unexpected error:  <class 'AttributeError'>
Error message   :  'NoneType' object has no attribute 'producer'

  File "/home/lba/.local/lib/python3.10/site-packages/pdfCropMargins/pdfCropMargins.py", line 59, in main
    output_doc_pathname, exit_code, stdout_str, stderr_str = crop()
  File "/home/lba/.local/lib/python3.10/site-packages/pdfCropMargins/pdfCropMargins.py", line 173, in crop
    output_doc_pathname = main_crop(argv_list)
  File "/home/lba/.local/lib/python3.10/site-packages/pdfCropMargins/main_pdfCropMargins.py", line 1574, in main_crop
    bounding_box_list, delta_page_nums = process_pdf_file(input_doc_pathname,
  File "/home/lba/.local/lib/python3.10/site-packages/pdfCropMargins/main_pdfCropMargins.py", line 1336, in process_pdf_file
    metadata_info.producer)

So you seem to be right, this seems to be a problem with metadata. As far as I can tell, this happens with all PDF files created by the Rocketbook app. The --gsFix option does solve the problem, thanks for the pointer!

I'm not sure whether you still consider this an error or whether this is exactly what you intended --gsFix for. Thus I'll leave this ticket open for now, please just close it if you think this is sufficiently fixed. In case you want to investigate further, I have attached an example file: metadata_problem.pdf I can open this file in Evince and Acrobat Reader without them complaining. I don't know enough about the PDF standard to determine whether this is a valid PDF or whether there really is some corrupted data (that Acrobat and Evince just silently ignore).

abarker commented 1 year ago

The new pdfCropMargins version 1.1.13 works fine on my system to crop the example file now, so I'm closing the issue. Thanks for the bug report.

abarker / pdfCropMargins

Problem on Ubuntu 22.04: `'NoneType' object has no attribute 'producer'` #46