caradoc-org / caradoc

A PDF parser and validator
GNU General Public License v2.0
300 stars 21 forks source link

Caradoc doesn't accept comments between xref and trailer #8

Open myrkraverk opened 7 years ago

myrkraverk commented 7 years ago

Caradoc fails with unexpected character if there's a comment between the xref and trailer; yet I find nothing in the spec from Adobe that prohibits this.

Example:

~/src/pdf$ cat xref.pdf      
%PDF-1.3
1 0 obj
<< /Type /Catalog >>
endobj
xref
0 2
0000000000 00000 f 
0000000010 00000 n 
% this here is a comment
trailer
<< /Root 1 0 R >>
startxref
45
%%EOF
~/src/pdf$ caradoc xref xref.pdf      
PDF error : Lexing error : unexpected character : 0x25 at offset 94 [0x5e] in file !

When % this here is a comment is removed and a blank line left, caradoc displays the xref. The pdf is otherwise (probably) invalid.

Please provide chapter and verse if the spec explicitly disallows comments between the xref and trailer.

gendx commented 7 years ago

The specification does not explicitly forbids this, but the specification is unclear and ambiguous in many ways and rarely goes into these kind of details. In Caradoc, we made a strict choice of rejecting ambiguous inputs by default.

We may add an option to allow this behavior in the relaxed parser, as we have already done for common errors found in the wild.

As for the strict parser (triggered with the --strict option), comments are not allowed (because they add complexity and can also lead to further ambiguity), so your example would be rejected anyway.