boazsegev / combine_pdf

A Pure ruby library to merge PDF files, number pages and maybe more...
MIT License
734 stars 155 forks source link

CombinePDF::ParsingError (Unknown PDF parsing error - malformed PDF file?) #184

Closed hazelsparrow closed 3 years ago

hazelsparrow commented 3 years ago

Hello,

Thanks for creating and maintaining this gem!

I'm running into an issue with CombinePDF loading a pdf.

irb(main):002:0> CombinePDF.load('td1-21e.pdf')
Unknown error parsing string at 77437 for string: Enfocus StatusChe�WKo�6���e���6ߒ�^�.�}$p�E��Um91∆�$�=����dR2'�4�p<��|����R�pȊ��>�c~��=����+g\���Z�U�����UC�A����5�B�����]]�M&����ګr���f/����hi��WtyW7�>�a��<AZ5YM�c�-O�{����e1 f�`�@r���h��c����q�=�.āGXm�A� q
?j�b������B0PU�9J<hD�C��1h��B�“ғ�Ѣ�L�Q��(��Q20IgE���z.��悻���[���@���lr-M��Yh������f�
               L��1�[c��K�D�PB؈Hh$jYO   o�J��_p7!�]��
Rh�[G�j��3-JX ���I3/��Y                              ��2�uBlMP��5I���
                       }�Hd�Z̻$d�&���ђܓ@��@���<z }���C7�
N���7K�'i��D0�D�                                        M:4�Ь��.=q�3�3�R3d�D�:��v�/=*�\����t�[Zz��'K�#̪O��ڗ��掍;擎y��wv�Zz�ӏ
                �z�&�`�P�Čl�M�]�tfTB��J��#̚<����ڹ�h Y���b!��,�dȊ�y/���(�˹&�#}D�G�Èx#�aD<����0"Fě�x�q'�m�]�  �o�&5M~�������\ww��MCƞt�黳Ӝ�J�
                                                                                                                                              ���pd
G��Gkw��Ņ{V1���>�w��7��T�*���{-]M�h�J��Y^����<���e��w��1������DY�A��t5[L����)[�U����V���QUk���ܔ7���r2[^#"t�����U~�g�B�A�#���S�Ak�E[U"㧪vu���+#�/Z���A����jR}�`���x�����6�����Fe苂�H�<3��i�mS�4"�bN���j�$Go�G�o��Lg���I!���ӥ���1|�c���}�yc����_T;���\xOx�/�Sc�g���8�%&9�ϴ���>/��4�׻r�F\)U�2�h5Y[�xCξ���h�H��r>���<�<�q��⩌/�K] �1�PqK&"��<ID2�_|��9��X�Mzt(���y3�\���'����
<</Author()/CreationDate(D:20201127132017-05'00')/Creator(Designer 6.3)/GTS_PDFXConformance(PDF/X-1a:2001)/GTS_PDFXVersion(PDF/X-1:2001)/ModDate(D:20201127132038-05'00')/Producer(Acrobat Distiller 15.0 \(Windows\))/Title(2021 Personal Tax Credits Return)!
Traceback (most recent call last):
        1: from (irb):2
CombinePDF::ParsingError (Unknown PDF parsing error - malformed PDF file?)

The pdf in question is https://www.canada.ca/content/dam/cra-arc/formspubs/pbg/td1/td1-21e.pdf

Any idea what could be causing this? I'm running version 1.0.20.

Thanks!

boazsegev commented 3 years ago

Hi @hazelsparrow,

Thank you for opening this issue.

I'm still looking into this and can't decide if the PDF is actually malformed or not. It appears as if the PDF has within it an object that resembles a nested PDF file. This confuses the parser since t usually resolves stream length errors by preferring the PDF data over the published length (but n this case the length is probably correct and the PDF data belongs to the nested PDF file)...

I don't know, I'll see if I can find something.

Thanks again! Bo.

boazsegev commented 3 years ago

Hi @hazelsparrow ,

I just released a patched version (1.0.21), could you see if this fixes the issue you're experiencing (it does on my machine).

Kindly, Bo.

hazelsparrow commented 3 years ago

Wow @boazsegev thank you so much for the quick reply AND for the fix! I can confirm that this is fixed in 1.0.21. Thank you!