Open szanati opened 8 years ago
I ran the pdf thru the GUI description service and it said the it was Well-Formed and valid and event outcome was a success. I tried the package on Ripple and it received the same error. On ripple I also tried editing the daitss-config.yml file under the transform_service I changed "skip_undefined" from false to true and it went thru the pdf steps it did not archive due to another issue on Ripple which will be handled next week involving squid.
The package on production is in the stashspace named: Github_781. It is in the directory: /var/daitss/data/stash/Github_781/ETAL9VQ5Q_V6OA41. On Ripple its in the workspace: /var/daitss/data/work/ENF28E4YI_X7LTMP. On ripple the original package is in: /var/daitss/ops/stephen/AA00038892_00002
This package fails with PDF to PDF/A conversion with PdfaPilot. Would need to submit an issue ticket to PdfaPilot vendor.
Alternatively, you can try to get this package ingested by turning off pdfa normalization.
Here is the instruction, https://github.com/daitss/core/wiki/Turn-off-PDF-to-PDFA-normalization
Email from Carol:
Response from callas. Looks like you can fix those PDFs with PDFapilot, though I am not sure how you want to pursue it seems it means the SIPs will be changed.
-Carol ---------- Forwarded message ---------- From: callas software support 3rdlevelsupport@callassoftware.com Date: Fri, Apr 21, 2017 at 8:21 AM Subject: Re: Problems with many PDF files using PDFaPilot To: "cchoufl@gmail.com" cchoufl@gmail.com
Hello Carol,
as David has already mentioned the cases have underlying issues, however, in both cases the PDF structure seems to be corrupt. Acrobat is still able to display the file, however the more thorough analysis with the PDF/A validator/converter fails. We will further investigate to make sure that this assumption is correct.
There is, however, already a known workaround for that problem: Both files can actually be converted when they are first converted to PostScript and back to PDF. You can do so by using ./pdfaPilot --redistill
Would that work for you as a - at least temporary - solution?
Best regards, Dietrich
--------------- Original Message --------------- From: callas software support team [support@callassoftware.com] Sent: 19.04.2017 21:15 To: cchoufl@gmail.com; d.seggern@callassoftware.com Subject: Re: Problems with many PDF files using PDFaPilot
Hi Carol,
I've reproduced the problem for both files. The underlying cause appears to be different for both files, they will be looked at by development to determine what is causing this and whether anything can be done about it.
I'll keep you posted! David.
--------------- Original Message --------------- From: carol chou [cchoufl@gmail.com] Sent: 19/04/2017 7:50 To: d.seggern@callassoftware.com Subject: Re: Problems with many PDF files using PDFaPilot
Hi Dietrich,
Our sys admin has installed the new version of PDFaPIlot, . Some of the problem files can now ben converted but the following two still give out errors during the conversion:
http://www.fcla.edu/daitss-test/files/00004-04-2009.pdf http://www.fcla.edu/daitss-test/files/00004-04-2009.pdf
Progress 100 %
Errors 16660 Device process color used but no PDF/A OutputIntent
Errors 114 Font not embedded (and text rendering mode not 3)
Errors 24 Annotation has no Flags entry
Errors 24 Annotation not set to print
Errors 6280 CharSet missing for Type 1 font
Summary Corrections 72
Summary Errors 23102
Summary Warnings 0
Summary Infos 0
Duration 00:54
Error 1000 Unknown error (unknown exception)
http://www.fcla.edu/daitss-test/files/09-06-2013.pdf http://www.fcla.edu/daitss-test/files/09-06-2013.pdf [cchou@ripple GH_781]$ /opt/pdfapilot-6.2.256/pdfaPilot 09-06-2013.pdf --fontfolder=/usr/share/fonts/msttcorefonts/ --onlypdfa --substitute --outputfile=09-06-2013-o.pdf --report=XML,IFNOPDFA,PATH=report.xml
Serialization This pdfaPilot instance is running with a Coldspare or Developer license and may only be used in production as a temporary replacement for a full license on another computer.
Input /home/cchou/pdfaError/GH_781/09-06-2013.pdf
Pages 32
PDFA Regular
Progress 100 %
Summary Corrections 0
Summary Errors 0
Summary Warnings 0
Summary Infos 0
Duration 00:01
Error 1010 The PDF file may be corrupt (unable to open PDF file).
Here is the pdfapilot version the sys admin has installed for us. callas pdfaPilot CLI 6.2.256 (x64)
2000-2016 callas software gmbh
Can you take a look again and provide us some solutions?
Thanks,
-Carol
On Mon, Oct 10, 2016 at 5:09 AM, Dietrich von Seggern <d.seggern@callassoftware.com mailto:d.seggern@callassoftware.com> wrote: Hi Carol,
what version of pdfaPilot are you using?
I was not able to reproduce any issues with the current release (callas pdfaPilot CLI 6.0.245 (x64)) on a Mac. The reason my either be the font situation or the version.
Best regards, Dietrich
-- Dietrich von Seggern | Managing Director callas software GmbH | Schönhauser Allee 6/7 | 10119 Berlin | Germany Tel +49.30.44390310 <tel:+49%2030%2044390310> | Fax +49.30.4416402 <tel:+49%2030%204416402> | www.callassoftware.com http://www.callassoftware.com/ Amtsgericht Charlottenburg, HRB 59615 | Geschäftsführung: Olaf Drümmer, Ulrich Frotscher, Dietrich von Seggern
Meet us at:
callas VIP Event, Berlin: November 7 - 8 (+ 9) https://en.xing-events.com/vip2016.html https://en.xing-events.com/vip2016.html
PDF Day Australia, Sydney: November 25 https://en.xing-events.com/PDFday-Australia.html https://en.xing-events.com/PDFday-Australia.html
On 9 Oct 2016, at 03:35, carol chou <cchoufl@gmail.com mailto:cchoufl@gmail.com> wrote:
Hi Mr. Seggern,
I am working with Florida Virtual Campus who has been using PDFaPilot to convert the PDF in their archive into PDFA. Recently, we have run into some PDFAPIlot errors with some of the PDFs in the archive. Can you please see if this is something that PDFAPilot can fix? The PDFs can be download at
http://www.fcla.edu/daitss-test/files/SCV20100314.pdf http://www.fcla.edu/daitss-test/files/SCV20100314.pdf
http://www.fcla.edu/daitss-test/files/00004-04-2009.pdf http://www.fcla.edu/daitss-test/files/00004-04-2009.pdf
http://www.fcla.edu/daitss-test/files/09-06-2013.pdf http://www.fcla.edu/daitss-test/files/09-06-2013.pdf
FYI, I am enclosing the pdfaPilot error at the end of my email too.
Thanks,
Carol
www.fourpees.com ref:_00D201c3C._500w01bNASQ:ref
Do we still have the original SIPs? We may need to fix the PDFs in the original SIPs (in consultation with their owners) and resubmit and abort the stashed SIPs with corrupt files. We'll need to discuss this.
This is worth emailing UF about, since they seem to have done multiple submissions of 3 different package names. They may need to authorize that we 'abort' some of the duplicates, and then we'll have fewer problem packages to deal with. Determine if we still have the SIPs. If we do, we should experiment with correcting one of the problem PDFs with PDF/A pilot by converting to PDF/A and back to PDF. Based on the results of this investigation decide how to proceed.
I did some validation of the PDFs remaining in the DAITSS Github_781 stashspace using description.fcla.edu. The results:
/var/daitss/data/stash/Github_781/E3EAYISFT_87ROR0/files/original/1 is a well-formed and valid PDF file. (The 3-Heights PDF online validator tool confirmed this)
/var/daitss/data/stash/Github_781/E5PJXWAWA_MUHKTZ/files/original/1 is not well-formed and the anomaly is "Invalid object definition" - What does this mean exactly? I looked this up in JHOVE error messages (http://wiki.opf-labs.org/display/Documents/JHOVE+issues+and+error+messages#JHOVEissuesanderrormessages-%22Invalidobjectdefinition%22) and it doesn't really give me details. I tried the 3-heights PDF validator online tool (https://www.pdf-online.com and got the following errors:
File data Compliance pdf1.2 Result Document does not conform to PDF/A. Details Validating file "data" for conformance level pdf1.2 The 'xref' keyword was not found or the xref table is malformed. The file trailer dictionary is missing or invalid. Error in Flate stream: data error. The operator has an invalid number of operands. The "Length" key of the stream object is wrong. The "Length" key of the stream object is wrong. The operator has an invalid number of operands. A path start operator was missing. The content stream contains an invalid operator. The "Length" key of the stream object is wrong. The "Length" key of the stream object is wrong. The operator has an invalid number of operands. Error in Flate stream: data error. An end text operator is missing. The content stream contains an invalid operator. The "Length" key of the stream object is wrong. Error in Flate stream: data error. The operator has an invalid number of operands. The document does not conform to the requested standard. The file format (header, trailer, objects, xref, streams) is corrupted. Done.
/var/daitss/data/stash/Github_781/EAKU060NA_VE67MN/files/original/1: the description service declares it well-formed and valid. The 3-Heights validator, however, gives the following error messages:
File data
Compliance pdf1.5
Result Document does not conform to PDF/A.
Details
Validating file "data" for conformance level pdf1.5
Error in Flate stream: data error.
Error in Flate stream: stream error.
The operator has an invalid number of operands.
An end text operator is missing.
The document does not conform to the requested standard.
The document's meta data is either missing or inconsistent or corrupt.
Done.
/var/daitss/data/stash/Github_781/ER9GZB3KZ_D6YG8G/files/original/1: the description service declares it well-formed and valid. The 3-Heights validator indicates that "Document validated successfully".
So it appears that the valid and well-formed PDFs may archive if the PDF/A Pilot is turned off. UF may need to recreate the other two.
Carol - can you confirm my conclusions?
I attempted to obtain details about the validity of the 4 remaining PDFs from Adobe Acrobat 9's Preflight feature but didn't have much success.
The original packages for this issue: AA00038892_00002, AA00047064_00008, and UF00098620_00421 are in: /var/daitss/ops/exceptions/tickets/GitHub_781 on darchive.
I received the follow error on a package with a pdf file:
error while processing 1(sip-files/09-06-2013.pdf): bad status http://transform.fda.fcla.edu/transform/pdf_norm?location=file:/var/daitss/data/work/ETAL9VQ5Q_V6OA41/files/original/1/data: 500 /opt/pdfapilot/pdfaPilot /var/daitss/data/work/ETAL9VQ5Q_V6OA41/files/original/1/data --fontfolder=/usr/share/fonts/msttcorefonts/ --onlypdfa --substitute --outputfile=/var/daitss/tmp/d20160317-22104-1k0gniu/data/transformed.pdf --report=XML,IFNOPDFA,PATH=/var/daitss/tmp/d20160317-22104-1k0gniu/pdfapilot_report.xml failed, output: Input /var/daitss/data/work/ETAL9VQ5Q_V6OA41/files/original/1/data
Pages 32
PDFA Regular Progress 100 % Summary Corrections 0 Summary Errors 0 Summary Warnings 0 Summary Infos 0 Duration 00:05
Error 1010 The PDF file may be corrupt (unable to open PDF file).