Closed Sarenor closed 2 days ago
I don't understand, we do need setFilename to actually read the file (where else?) and the validator can and should should catch IrrecoverableValidationError and add it to the validation XML (and sometimes even does so, e.g. validator l 330). And I guess whenever a PDF-stream would be validated setFilename() would never be called because there is no filename at all, so changes in this method should not affect stream validation?
It's called here in the ZUGFeRDValidator
:
public String validate(InputStream inputStream, String fileNameOfInputStream) {
boolean xmlValidity;
context.clear();
StringBuffer finalStringResult = new StringBuffer();
SimpleDateFormat isoDF = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
Date date = new Date();
startTime = Calendar.getInstance().getTimeInMillis();
context.setFilename(fileNameOfInputStream);// set filename without path
finalStringResult.append("<validation filename='").append(context.getFilename()).append("' datetime='").append(isoDF.format(date)).append("'>");
boolean isPDF = false;
byte[] content = new byte[0];
try {
if (fileNameOfInputStream == null) {
optionsRecognized = false;
context.addResultItem(new ValidationResultItem(ESeverity.fatal, "Filename not specified").setSection(10)
.setPart(EPart.pdf));
}
PDFValidator pdfv = new PDFValidator(context);
if (inputStream == null) {
context.addResultItem(
new ValidationResultItem(ESeverity.fatal, "File not found").setSection(1).setPart(EPart.pdf));
} else if (inputStream.available() < 32) {
// with less then 32 bytes it can not even be a proper XML file
context.addResultItem(
new ValidationResultItem(ESeverity.fatal, "File too small").setSection(5).setPart(EPart.pdf));
} else {
content = IOUtils.toByteArray(inputStream);
isPDF = ByteArraySearcher.contains(content, new byte[]{'%', 'P', 'D', 'F'});
XMLValidator xv = new XMLValidator(context);
if (isPDF) {
pdfv.setFilename(fileNameOfInputStream);
pdfv.setFileContents(content);
optionsRecognized = true;
finalStringResult.append("<pdf>");
Removing it there would probably work, since the file name is only used for Validation Messages and errors.
Edit & Update: Removing it there wouldn't work, because VeraPDF ItemDetails
expect the name to not be null.
Either passing in the Filecontents via PdfValidator.setFileContents
or a dedicated load file method might be an option?
At least in the validate(InputStream contents, String filename)
method, the only contents of the final Result StringBuffer is going to be
finalStringResult.append("<validation filename='").append(context.getFilename()).append("' datetime='").append(isoDF.format(date)).append("'>");
then the exception from the setFilename
triggers the try-catch and the final results are processed.
can you set this.pdfFilename = filename; directly or can we have a non-loading method e.g. with additional params (defaulting to load)? And can we maybe introdcue have one or two tests of the streaming validator? @phax had also requested more streaming and I would not likt anybody to accidentally destroy this functionality and going unnoticed...
We had some tests for the streaming validator, I'm surprised it didn't catch it actually. Having it with a loadFile boolean sounds like a good compromise, yeah. I'll see what I can do on Monday, including additional test cases.
Inside the #406 PR I took the liberty to harmonize the "validate" methods and add another overload for byte[]
- this also reduces the amount of IO needed, as the file to be validated is read no more then once.
However, the general Validator
interface is no longer 100% accurate, but I hope that doesn't do any harm.
Also I optimized the ByteArraySearcher
so that it is more efficient, especially for large non-matching files.
This commit breaks PDF-Stream validation since the ZugferdValidator calls the
setFilename
method of thePdfValidator
Since the final results depends on string results being written, with no results being written yet when the exception is thrown, any (pdf) file will be considered valid.