Closed eischaefer closed 1 month ago
@eischaefer we will add this to the list
@eischaefer @jordanpadams
It will take me a while (days) to stand up a windows platform again. However, I can immediately see what is wrong and it can be fixed on the command line (I hope). Instead of:
validate --target C:\path\to\example.pdf
use
validate --target file:///C:/path/to/example.pdf
(yes 3 slashes after file:)
I know the documents do not say this but they are written for *nix and for those a path correctly becomes file:///path/to/example.pdf. The Java libraries do not do so well with windows. Beyond updating the documentation for windows, not sure we can make any substantial changes to validate to make it work reliably.
I will continue standing up a windows platform in case the suggestion does not work.
@al-niessner , thank you very much for your effort!
I apologize, but I gave the wrong command in the first comment. The correct command is:
validate --target C:\path\to\example.xml
where target points to the .xml, not the .pdf.
Styling that with a URI, as you suggested:
validate --target file:///C:/path/to/example.xml
gives the same error as before.
Note that that error references the .pdf, not the .xml, so it seems to me that:
target
, Validate correctly interprets that path.<file_name>
from the .xml's content with the passed target
to resolve the complete path to the .pdf, the result is invariably of the form /C:/path/to/\example.pdf
, which VeraPDF does not understand.
/
's and starts with a /
even when a regular Windows path (C:\path\to\example.xml
) is passed, so some URI-like conversion must be occurring internally. Thankfully, that suggests that passing a regular path (on Windows or *nix) might be OK as long as the internal path conversion is fixed.Incidentally, note that this exact same command (without resorting to a URI) works flawlessly in Validate 3.2.0 (as I noted in my original post), in case that's of help when debugging.
@eischaefer
Thanks for the update; it makes more sense now. I am almost done with my windows platform and will debug it.
That this problem exists is not a surprise. We had several other URI/URL problems with windows that required more pedantic handling of them (URLs). It is not a surprise that one or more code paths were missed during those updates as I work with limited sets of test data at a time. I cannot download your data to debug this issue. I will need both the XML and PDF in question. The link at the top of the ticket tries to open a validate issue called example.pdf.
@al-niessner , I have updated To Reproduce in the first comment with correct instructions. Please let me know if you need anything else from me.
Thanks. I have them both.
When run on linux get this as output (base line of expectation for windows platform):
PDS Validate Tool Report
Configuration:
Version 3.6.0-SNAPSHOT
Date 2024-10-09T21:13:27Z
Parameters:
Targets [file:/home/niessner/Projects/PDS/validate/src/test/resources/github1008/example.xml]
Severity Level WARNING
Recurse Directories true
File Filters Used [*.xml, *.XML]
Data Content Validation on
Product Level Validation on
Max Errors 100000
Registered Contexts File /home/niessner/Projects/PDS/validate/target/classes/util/registered_context_products.json
Product Level Validation Results
FAIL: file:/home/niessner/Projects/PDS/validate/src/test/resources/github1008/example.xml
ERROR [error.pdf.file.not_pdfa_compliant] Validation failed for flavour PDF/A-1b in file example.pdf.
1 product validation(s) completed
Summary:
1 product(s)
1 error(s)
0 warning(s)
Product Validation Summary:
0 product(s) passed
1 product(s) failed
0 product(s) skipped
1 product(s) total
Referential Integrity Check Summary:
0 check(s) passed
0 check(s) failed
0 check(s) skipped
0 check(s) total
Message Types:
1 error.pdf.file.not_pdfa_compliant
End of Report
Completed execution in 11332 ms
Not exactly detailed as to why but fails from PDF not being A/B compliant rather than internal error. Moving to windows platform for more testing. If the full details of non-compliance are desired, then use the --pdf-error-dir.
When run on linux get this as output (base line of expectation for windows platform):
Yep! As stated in my original comment, this is what I would hope to see on Windows for this file and exactly what is reported on Windows for this file with Validate 3.2.
Incidentally, my actual command is much more complicated than the example provided (and includes --pdf-error-dir
, etc.), but I intentionally provided a minimal reproducible example.
Thanks again for your help!
Checked for duplicates
Yes - I've already checked
đ Describe the bug
Validation of the attached PDF (see To Reproduce) with Validate 3.5.2 gives the error:
The PDF is indeed not PDS4-compatible, but the online demo VeraPDF reports a very different set of issues, nowhere referring to an illegal character nor an inability to read the content. The character ":" is also plausibly at "index 2" in the path (depending on how one counts), which suggests to me that parsing the path itself is the root cause.
I'm not sure whether the invalid
/\
before the filename or the likewise invalid (albeit POSIX-like) leading/
are relevant, but they are both absent from the 3.2.0 output, which gives the expected error:đľď¸ Expected behavior
I expected Validate to read the PDF content and report issues similar to the online demo VeraPDF.
đ To Reproduce
validate --target C:\path\to\example.xml
with Validate 3.5.2. Note: Typo corrected in edit. (Originaltarget
was erroneously .pdf, not .xml.)đĽ Environment Info
đ Version of Software Used
𩺠Test Data / Additional context
No response
đŚ Related requirements
đŚ #xyz
âď¸ Engineering Details
No response
đ Integration & Test
No response