daitss / core

DAITSS: Dark Archive In The Sunshine State
GNU General Public License v3.0
9 stars 2 forks source link

Fatal error with pdf file #769

Open szanati opened 9 years ago

szanati commented 9 years ago

The IEID for the below error is EZYQ0HHD9_97E59T

I received the following error while tying to ingest a package that contained only the pdf, xml and mets file:

error while processing 1(sip-files/05-10-2015.pdf): Fatal error: EntityRef: expecting ';' at :183.

/opt/web-services/sites/core/releases/20141117161651/lib/daitss/proc/wip/preserve.rb:26:in rescue in block in preserve' /opt/web-services/sites/core/releases/20141117161651/lib/daitss/proc/wip/preserve.rb:19:inblock in preserve' /opt/web-services/sites/core/releases/20141117161651/lib/daitss/proc/wip/preserve.rb:18:in each' /opt/web-services/sites/core/releases/20141117161651/lib/daitss/proc/wip/preserve.rb:18:inpreserve' /opt/web-services/sites/core/releases/20141117161651/lib/daitss/proc/wip/ingest.rb:33:in ingest' /opt/web-services/sites/core/releases/20141117161651/lib/daitss/proc/wip/process.rb:82:inblock in spawn' /opt/web-services/sites/core/releases/20141117161651/lib/daitss/proc/wip/process.rb:66:in fork' /opt/web-services/sites/core/releases/20141117161651/lib/daitss/proc/wip/process.rb:66:inspawn' /opt/web-services/sites/core/current/bin/pulse:161:in block in start_wips' /opt/web-services/sites/core/current/bin/pulse:158:ineach' /opt/web-services/sites/core/current/bin/pulse:158:in start_wips' /opt/web-services/sites/core/current/bin/pulse:194:inblock in ' /opt/web-services/sites/core/current/bin/pulse:192:in `loop' /opt/web-services/sites/core/current/bin/pulse:192:in

jonpitts commented 9 years ago

Looks like a problem with the XML. Without being able to see it there may be a character that needs to be escaped. Go to line 183 of the XML file.

szanati commented 9 years ago

Thanks Jon. I'll take a look at line 183.

szanati commented 9 years ago

This is what is on line 183:

szanati commented 9 years ago

I was able to use this xml and use a different pdf that was from the same source but a different package that had already archived. I just updated the xml for the lines for the name of the pdf and did a checksum on the other pdf. I was able to archive this way on the test machine. It makes me think that maybe it's the pdf and not the xml.

cchou commented 9 years ago

Yes, the error is in processing the PDF. "error while processing 1(sip-files/05-10-2015.pdf): Fatal error: EntityRef: expecting ';' at :183."

Can you send this PDF to the description service and see if you get any error? I guess there is some bad character embedded in the PDF metadata.

szanati commented 9 years ago

I did run it. Here is what I got:

Not well-formedsuccessExpected dictionary for font entry in page resource Outlines contain recursive references. Annotation object is not a dictionary Here is what I got on the pdf that did archive: Not well-formedsuccessUnexpected exception java.lang.ClassCastException Outlines contain recursive references. Annotation object is not a dictionary Invalid Font entry in Resources
cchou commented 9 years ago

From the code, https://github.com/daitss/core/blob/master/lib/daitss/proc/wip/preserve.rb#L26, it looks like it's while processing the file so it's either description service, action plan or transformation. Anything in the logs for those services relating to this file?

Basically what core does is:

  1. Send the file to the description service,
  2. Send the xml returned from the description service to action plan to get the processing instruction for PDF.
  3. Send the PDF to the transformation service with the processing instruction to convert the file.

There should be error logs in the those services if they encounters any problem. Or, you can call those service individually.

szanati commented 9 years ago

Thanks carol. I see if I see anything in the logs.

szanati commented 9 years ago

I did find this in the transform log for the package in question:

2015 Jun 9 09:49:13 fclnx30 Transform[9218]: INFO transform.fda.fcla.edu: location = file:/var/daitss/data/work/EPU5LDS3K_TCXIEF/files/original/1/data 2015 Jun 9 09:49:50 fclnx30 Transform[9218]: INFO transform.fda.fcla.edu: error code pid 12274 exit 1 2015 Jun 9 09:49:50 fclnx30 Transform[9218]: INFO transform.fda.fcla.edu: Rack: 192.168.36.60 - - [09/Jun/2015 09:49:50] "GET /transform/pdf_norm?location=file:/var/daitss/data/work/EPU5LDS3K_TCXIEF/files/original/1/data " 200 18288 36.3724 2015 Jun 9 09:51:39 fclnx30 Transform[9218]: INFO transform.fda.fcla.edu: Rack: 192.168.36.60 - - [09/Jun/2015 09:51:39] "GET /transform/pdf_norm " 400 38 0.0009 2015 Jun 9 09:51:39 fclnx30 Transform[9218]: INFO transform.fda.fcla.edu: location = file:/var/daitss/data/work/EPU5LDS3K_TCXIEF/files/original/1/data 2015 Jun 9 09:52:16 fclnx30 Transform[9218]: INFO transform.fda.fcla.edu: error code pid 13533 exit 1 2015 Jun 9 09:52:16 fclnx30 Transform[9218]: INFO transform.fda.fcla.edu: Rack: 192.168.36.60 - - [09/Jun/2015 09:52:16] "GET /transform/pdf_norm?location=file:/var/daitss/data/work/EPU5LDS3K_TCXIEF/files/original/1/data " 200 18289 37.0649

I had reset it and tried it again that is why it shows several errors in a short amount of time.

jonpitts commented 9 years ago

Can you check the description service also? These services generate metadata in XML and the error is XML related. There is probably an un-escaped ampersand in the resulting metadata from one of the services.

szanati commented 9 years ago

This is what I got from the describe log if that is what you mean by description services for the package:

2015 Jun 8 22:03:43 fclnx30 Describe[5300]: INFO describe.fda.fcla.edu: Rack: 192.168.36.60 - - [08/Jun/2015 22:03:43] "GET /describe?location=file:/var/daitss/data/work/EPU5LDS3K_TCXIEF/files/original/2/data&uri=info%3Afda%2FEPU5LDS3K_TCXIEF%2Ffile%2F2&originalName=sip-files%2FUF00028308_02603.mets HTTP/1.0" 200 3515 0.0901 2015 Jun 8 22:03:43 fclnx30 Describe[5288]: INFO describe.fda.fcla.edu: transforming JHOVE output to DocMD

2015 Jun 8 22:03:43 fclnx30 Describe[5288]: INFO describe.fda.fcla.edu: Rack: 192.168.36.60 - - [08/Jun/2015 22:03:43] "GET /describe?location=file:/var/daitss/data/work/EPU5LDS3K_TCXIEF/files/original/1/data&uri=info%3Afda%2FEPU5LDS3K_TCXIEF%2Ffile%2F1&originalName=sip-files%2F05-10-2015.pdf HTTP/1.0" 200 213993 0.7045

2015 Jun 9 09:49:13 fclnx30 Describe[5300]: INFO describe.fda.fcla.edu: Rack: 192.168.36.60 - - [09/Jun/2015 09:49:13] "GET /describe?location=file:/var/daitss/data/work/EPU5LDS3K_TCXIEF/files/original/1/data&uri=info%3Afda%2FEPU5LDS3K_TCXIEF%2Ffile%2F1&originalName=sip-files%2F05-10-2015.pdf HTTP/1.0" 200 213993 0.7857 2015 Jun 9 09:51:38 fclnx30 Describe[5288]: INFO describe.fda.fcla.edu: transforming JHOVE output to DocMD

2015 Jun 9 09:51:39 fclnx30 Describe[5288]: INFO describe.fda.fcla.edu: Rack: 192.168.36.60 - - [09/Jun/2015 09:51:39] "GET /describe?location=file:/var/daitss/data/work/EPU5LDS3K_TCXIEF/files/original/1/data&uri=info%3Afda%2FEPU5LDS3K_TCXIEF%2Ffile%2F1&originalName=sip-files%2F05-10-2015.pdf HTTP/1.0" 200 213993 0.7347

jonpitts commented 9 years ago

That all looks fine. I'd lean toward the issue being in Transform.

Can you check the contents of file:/var/daitss/data/work/EPU5LDS3K_TCXIEF/files/original/1/data? I think it should be in xml format. If so, look at line 183.

szanati commented 9 years ago

I see these two items in /var/daitss/data/work/EPU5LDS3K_TCXIEF/files/original/1/ data metadata/ When I try to cat data its a bunch of items I can not read. I believe data is the pdf. Now in metadata there is: aip-path describe-agent describe-bitstream-objects describe-event describe-file-object sip-path virus-check-agent virus-check-event

jonpitts commented 9 years ago

You can look at those but my fear is that the XML is not being created/saved. Looking back at the previous issues it could be something similar to a fix Carol put in Describe:

250

This may require some debugging to get the resulting metadata from Transform and putting in a fix. Carol might have a better feel for the issue. It seems like there should be better logging for Transform.

szanati commented 9 years ago

Thanks for all your help.

cchou commented 9 years ago

Yes, it's most likely the metadata. You can send the XML returned from the describ to an XML validator to find the offending metadata. Unfortunately I don't have server access any more so it would be difficult for me to trouble shoot.

szanati commented 9 years ago

Where is the xml that is returned from the describ located?

cchou commented 9 years ago

core break up the returned XML into describe-agent describe-bitstream-objects describe-event describe-file-object.

The easiest way is to send the PDF to description.fcla.edu to get the XML. However, please note that this bug will most likely need a code fix.

szanati commented 9 years ago

Thanks for all your help Carol.

szanati commented 8 years ago

I received another one of these errors:

error while processing 1(sip-files/03-13-2016.pdf): Fatal error: EntityRef: expecting ';' at :67.

trace

/opt/web-services/sites/core/releases/20141117161651/lib/daitss/proc/wip/preserve.rb:26:in rescue in block in preserve' /opt/web-services/sites/core/releases/20141117161651/lib/daitss/proc/wip/preserve.rb:19:inblock in preserve' /opt/web-services/sites/core/releases/20141117161651/lib/daitss/proc/wip/preserve.rb:18:in each' /opt/web-services/sites/core/releases/20141117161651/lib/daitss/proc/wip/preserve.rb:18:inpreserve' /opt/web-services/sites/core/releases/20141117161651/lib/daitss/proc/wip/ingest.rb:33:in ingest' /opt/web-services/sites/core/releases/20141117161651/lib/daitss/proc/wip/process.rb:82:inblock in spawn' /opt/web-services/sites/core/releases/20141117161651/lib/daitss/proc/wip/process.rb:66:in fork' /opt/web-services/sites/core/releases/20141117161651/lib/daitss/proc/wip/process.rb:66:inspawn' /opt/web-services/sites/core/current/bin/pulse:161:in block in start_wips' /opt/web-services/sites/core/current/bin/pulse:158:ineach' /opt/web-services/sites/core/current/bin/pulse:158:in start_wips' /opt/web-services/sites/core/current/bin/pulse:194:inblock in ' /opt/web-services/sites/core/current/bin/pulse:192:in loop' /opt/web-services/sites/core/current/bin/pulse:192:in '

szanati commented 8 years ago

I sent the pdf to the description services and got:

Not well-formedsuccessInvalid character in hex string
szanati commented 7 years ago

The ieid are as follows: EN49CK9DL_YMA17R, EPU5LDS3K_TCXIEF, EMNT2DRN2_UEZLFW, EXGHFQQND_VIPEE0, and EZYQ0HHD9_97E59T for package UF00028308_02603. EBC275L56_4UR7OW for package UF00098964_03959.

lydiam commented 7 years ago

Do we know what the exact problem is in this issue? Is it a description service bug?

cchou commented 7 years ago

Stephen, can you put the offending PDF on the daitss-test?

szanati commented 7 years ago

Carol, there are 3 different pdfs for 3 packages that have the similar issue to the original issue. The pdfs are as follows: 05-10-2015.pdf is the pdf for the original issue. The other 2 are 03-13-2016.pdf and 04-30-2017.pdf. I have copied them all to the daitss-test site.

cchou commented 7 years ago

The description service has not problem processing these PDFs. Actionplan is OK too. I tried the package UF00028308_02603, it archived with pdfapilot turn off. Once I turn on the pdfapilot, core fails with the same error. Looks like the problem lies in core cannot process the xml returned from transform service with error generated by pdfapilot, particularly this one which contain special character: Fix Repair invalid ToUnicode CMap information in fonts MfYoung&Beautiful

It appears that this PDF have error that Callas cannot fix to convert it into PDF/A. I will reports those PDFs to Callas. However, we may want to have UF fix those files since it includes a lot of unembedded special fonts.

Pages 19
PDFA Regular Progress 6 % Fix Set values to implementation limits of PDF/A if possible
Progress 15 % Progress 16 % Fix Force blend color space to sRGB Fix Prepare annotations for PDF/A-1 Progress 17 % Progress 18 % Progress 19 % Progress 20 % Progress 21 % Progress 22 % Progress 23 % Progress 24 % Progress 25 % Progress 26 % Progress 27 % Progress 28 % Progress 29 % Progress 30 % Fix Substitute characters using .notdef glyph with space characters Progress 31 % Progress 32 % Progress 33 % Progress 34 % Progress 35 % Progress 36 % Progress 37 % Fix Convert SMask to image mask Fix Repair invalid ToUnicode CMap information in fonts MyriadPro-Bold
Fix Repair invalid ToUnicode CMap information in fonts MyriadPro-Black Fix Repair invalid ToUnicode CMap information in fonts Utopia-Bold Fix Repair invalid ToUnicode CMap information in fonts CenturyOldStyle-Regular Fix Repair invalid ToUnicode CMap information in fonts MyriadPro-Regular
Fix Repair invalid ToUnicode CMap information in fonts Utopia-Semibold Fix Repair invalid ToUnicode CMap information in fonts CaslonTwoTwentyFour-Black
Fix Repair invalid ToUnicode CMap information in fonts FranklinGothic-BookOblique
Fix Repair invalid ToUnicode CMap information in fonts MyriadPro-BoldCond
Fix Repair invalid ToUnicode CMap information in fonts MyriadPro-BoldCond2 Fix Repair invalid ToUnicode CMap information in fonts HelveticaNeueLTStd-XBlkCn
Fix Repair invalid ToUnicode CMap information in fonts MyriadPro-Cond
Fix Repair invalid ToUnicode CMap information in fonts MyriadPro-Cond2 Fix Remove additional encoding entries in cmap of symbolic TrueType fonts Helvetica
Fix Repair invalid ToUnicode CMap information in fonts Helvetica
Fix Add missing SPACE glyphs FranklinGothic-BookOblique2 Fix Repair invalid ToUnicode CMap information in fonts CenturyOldStyle-Italic
Fix Repair invalid ToUnicode CMap information in fonts Utopia-Regular
Fix Remove additional encoding entries in cmap of symbolic TrueType fonts Helvetica2
Fix Repair invalid ToUnicode CMap information in fonts Helvetica2
Fix Add missing SPACE glyphs MyriadPro-Black2
Fix Remove additional encoding entries in cmap of symbolic TrueType fonts ArialNarrow Fix Repair invalid ToUnicode CMap information in fonts ArialNarrow Fix Repair invalid ToUnicode CMap information in fonts GillSans-Bold
Fix Repair invalid ToUnicode CMap information in fonts CenturyGothic
Fix Repair invalid ToUnicode CMap information in fonts AvenirLTStd-Medium
Fix Repair invalid ToUnicode CMap information in fonts Helvetica3
Fix Add missing SPACE glyphs AvenirLTStd-Light
Fix Repair invalid ToUnicode CMap information in fonts AvenirLTStd-Black
Fix Repair invalid ToUnicode CMap information in fonts AvenirLTStd-Light2
Fix Repair invalid ToUnicode CMap information in fonts AvenirLTStd-Heavy
Fix Repair invalid ToUnicode CMap information in fonts BradleyHandITCTTBold
Fix Repair invalid ToUnicode CMap information in fonts CenturyGothic2
Fix Remove additional encoding entries in cmap of symbolic TrueType fonts ArialNarrow-Italic
Fix Repair invalid ToUnicode CMap information in fonts ArialNarrow-Italic
Fix Remove additional encoding entries in cmap of symbolic TrueType fonts ArialNarrow-BoldItalic
Fix Repair invalid ToUnicode CMap information in fonts ArialNarrow-BoldItalic
Fix Remove additional encoding entries in cmap of symbolic TrueType fonts ArialNarrow2
Fix Repair invalid ToUnicode CMap information in fonts ArialNarrow2
Fix Remove additional encoding entries in cmap of symbolic TrueType fonts ArialNarrow-Bold
Fix Repair invalid ToUnicode CMap information in fonts ArialNarrow-Bold
Fix Repair invalid ToUnicode CMap information in fonts ZapfDingbatsITC Fix Repair invalid ToUnicode CMap information in fonts ZapfDingbatsITC2
Fix Repair invalid ToUnicode CMap information in fonts MyriadPro-Semibold
Fix Remove additional encoding entries in cmap of symbolic TrueType fonts TimesNewRomanPS-BoldMT
Fix Repair invalid ToUnicode CMap information in fonts TimesNewRomanPS-BoldMT
Fix Repair invalid ToUnicode CMap information in fonts MinionPro-Medium
Fix Add missing SPACE glyphs MinionPro-Bold
Fix Remove additional encoding entries in cmap of symbolic TrueType fonts ArialMT Fix Repair invalid ToUnicode CMap information in fonts ArialMT Fix Remove additional encoding entries in cmap of symbolic TrueType fonts TimesNewRomanPSMT
Fix Repair invalid ToUnicode CMap information in fonts TimesNewRomanPSMT
Fix Remove additional encoding entries in cmap of symbolic TrueType fonts Times-Bold
Fix Repair invalid ToUnicode CMap information in fonts Times-Bold
Fix Remove additional encoding entries in cmap of symbolic TrueType fonts Times-Roman Fix Repair invalid ToUnicode CMap information in fonts Times-Roman Fix Repair invalid ToUnicode CMap information in fonts Times-Roman2
Fix Remove additional encoding entries in cmap of symbolic TrueType fonts HelveticaNeue-Italic
Fix Repair invalid ToUnicode CMap information in fonts HelveticaNeue-Italic
Fix Repair invalid ToUnicode CMap information in fonts KeplerStd-Bold
Fix Repair invalid ToUnicode CMap information in fonts KeplerStd-Bold2 Fix Repair invalid ToUnicode CMap information in fonts MyriadPro-Semibold2 Fix Repair invalid ToUnicode CMap information in fonts FranklinGothic-Demi2
Fix Add missing SPACE glyphs GillSans-LightItalic
Fix Repair invalid ToUnicode CMap information in fonts KeplerStd-Regular
Fix Repair invalid ToUnicode CMap information in fonts Impact
Fix Add missing SPACE glyphs GillSans-Light
Fix Repair invalid ToUnicode CMap information in fonts AGaramondPro-Semibold
Fix Remove additional encoding entries in cmap of symbolic TrueType fonts ArialNarrow3
Fix Repair invalid ToUnicode CMap information in fonts ArialNarrow3
Fix Remove additional encoding entries in cmap of symbolic TrueType fonts ArialNarrow-Bold2
Fix Repair invalid ToUnicode CMap information in fonts ArialNarrow-Bold2
Fix Remove additional encoding entries in cmap of symbolic TrueType fonts TimesNewRomanPSMT2
Fix Repair invalid ToUnicode CMap information in fonts TimesNewRomanPSMT2
Fix Add missing SPACE glyphs MyriadPro-Bold3 Fix Add missing SPACE glyphs Helvetica-Bold2 Fix Remove additional encoding entries in cmap of symbolic TrueType fonts Helvetica4
Fix Repair invalid ToUnicode CMap information in fonts Helvetica4
Fix Remove additional encoding entries in cmap of symbolic TrueType fonts TimesNewRomanPSMT3
Fix Repair invalid ToUnicode CMap information in fonts TimesNewRomanPSMT3
Fix Repair invalid ToUnicode CMap information in fonts TimesNewRomanPSMT4
Fix Remove additional encoding entries in cmap of symbolic TrueType fonts ArialNarrow-Bold3
Fix Repair invalid ToUnicode CMap information in fonts ArialNarrow-Bold3
Fix Remove additional encoding entries in cmap of symbolic TrueType fonts ArialNarrow4
Fix Repair invalid ToUnicode CMap information in fonts ArialNarrow4
Fix Add missing SPACE glyphs GillSans-LightItalic2
Fix Repair invalid ToUnicode CMap information in fonts GillSans-LightItalic2
Fix Repair invalid ToUnicode CMap information in fonts GillSans-Light2 Fix Repair invalid ToUnicode CMap information in fonts Utopia-Bold3
Fix Add missing SPACE glyphs Helvetica-Bold3 Fix Repair invalid ToUnicode CMap information in fonts Helvetica-Bold3 Fix Repair invalid ToUnicode CMap information in fonts CenturyOldStyle-Regular3
Fix Repair invalid ToUnicode CMap information in fonts GillSans-Bold3
Fix Repair invalid ToUnicode CMap information in fonts ZapfDingbatsITC3
Fix Repair invalid ToUnicode CMap information in fonts GillSans3
Fix Add missing SPACE glyphs AntiqueOlive-Bold2
Fix Repair invalid ToUnicode CMap information in fonts AntiqueOlive-Bold2
Fix Repair invalid ToUnicode CMap information in fonts ArialNarrow-Bold4
Fix Repair invalid ToUnicode CMap information in fonts ArialNarrow5
Fix Repair invalid ToUnicode CMap information in fonts Helvetica5
Fix Repair invalid ToUnicode CMap information in fonts MfYoung&Beautiful
Fix Repair invalid ToUnicode CMap information in fonts Centaur Fix Repair invalid ToUnicode CMap information in fonts MyriadPro-Bold4 Fix Repair invalid ToUnicode CMap information in fonts Interstate-Black
Fix Repair invalid ToUnicode CMap information in fonts MyriadPro-Regular5
Fix Repair invalid ToUnicode CMap information in fonts Interstate-Light
Fix Repair invalid ToUnicode CMap information in fonts TimesNewRomanPS2
Fix Add missing SPACE glyphs MyriadPro-Regular6
Fix Repair invalid ToUnicode CMap information in fonts MyriadPro-Cond4 Fix Repair invalid ToUnicode CMap information in fonts MyriadPro-Semibold4 Fix Repair invalid ToUnicode CMap information in fonts MyriadPro-Cond5 Fix Repair invalid ToUnicode CMap information in fonts MyriadPro-It4
Fix Add missing SPACE glyphs Syntax-UltraBlack
Fix Repair invalid ToUnicode CMap information in fonts Syntax-UltraBlack
Fix Repair invalid ToUnicode CMap information in fonts Gotham-Black
Fix Repair invalid ToUnicode CMap information in fonts CaslonFiveForty-Italic
Fix Repair invalid ToUnicode CMap information in fonts Gotham-Bold Fix Repair invalid ToUnicode CMap information in fonts CaslonTwoTwentyFour-Book
Fix Repair invalid ToUnicode CMap information in fonts LietzLindauHamburg
Fix Repair invalid ToUnicode CMap information in fonts Syntax-Roman
Fix Add missing SPACE glyphs Gotham-Ultra
Fix Repair invalid ToUnicode CMap information in fonts Gotham-Ultra
Fix Repair invalid ToUnicode CMap information in fonts PTF-NORDIC-Round
Fix Repair invalid ToUnicode CMap information in fonts Impact2 Fix Repair invalid ToUnicode CMap information in fonts Gotham-Book Fix Repair invalid ToUnicode CMap information in fonts Times-Bold2 Fix Repair invalid ToUnicode CMap information in fonts MyriadPro-Bold6 Fix Add missing SPACE glyphs AntiqueOlive-Bold3
Fix Repair invalid ToUnicode CMap information in fonts Helvetica-Bold4 Fix Repair invalid ToUnicode CMap information in fonts Times-Roman4
Fix Repair invalid ToUnicode CMap information in fonts MyriadPro-Bold7 Fix Repair invalid ToUnicode CMap information in fonts MyriadPro-BoldCond6 Fix Repair invalid ToUnicode CMap information in fonts Myriad-Bold Fix Repair invalid ToUnicode CMap information in fonts MyriadPro-Semibold5 Fix Repair invalid ToUnicode CMap information in fonts MyriadPro-Cond6 Fix Add missing SPACE glyphs Wingdings
Fix Repair invalid ToUnicode CMap information in fonts CaslonThree-Roman2
Fix Repair invalid ToUnicode CMap information in fonts TimesNewRomanPS-BoldMT2 Fix Repair invalid ToUnicode CMap information in fonts HelveticaNeue-CondensedBold Fix Repair invalid ToUnicode CMap information in fonts TimesNewRomanPSMT5
Fix Repair invalid ToUnicode CMap information in fonts Helvetica-Bold5 Fix Repair invalid ToUnicode CMap information in fonts HelveticaNeue-CondensedBlack
Fix Repair invalid ToUnicode CMap information in fonts MyriadPro-Regular7
Fix Repair invalid ToUnicode CMap information in fonts TimesNewRomanPS-ItalicMT
Fix Add missing SPACE glyphs GillSans-LightItalic3
Fix Repair invalid ToUnicode CMap information in fonts GillSans-LightItalic3
Fix Repair invalid ToUnicode CMap information in fonts GillSans-Light3 Fix Repair invalid ToUnicode CMap information in fonts TimesNewRomanPS-BoldItalicMT
Fix Add missing SPACE glyphs Helvetica-Bold7 Fix Remove additional encoding entries in cmap of symbolic TrueType fonts Helvetica8
Fix Repair invalid ToUnicode CMap information in fonts Helvetica8
Fix Repair invalid ToUnicode CMap information in fonts ZapfDingbatsITC4
Fix Repair invalid ToUnicode CMap information in fonts GillSans4
Fix Add missing SPACE glyphs Helvetica-Bold8 Fix Remove additional encoding entries in cmap of symbolic TrueType fonts Helvetica-Bold8 Fix Repair invalid ToUnicode CMap information in fonts Helvetica-Bold8 Fix Add missing SPACE glyphs TMSsymbols
Fix Fix font encoding (CIDToGIDMap) Fix Fix font encoding (CharSet) Fix Fix font encoding (CIDSet)
Fix Adjust colors for PDF based ISO standards
Fix Insert missing Type entry in StructElem objects Fix Flatten transparency (high resolution)
Fix Make document XMP Metadata compliant with PDF/A-1
Fix Remove all xmpMM:Manifest metadata entries
Fix Remove XMP Metadata if not compliant with PDF/A Fix Repair invalid bookmark hierarchies Progress 48 % Progress 49 % Progress 50 % Fix Compress all uncompressed objects using lossless ZIP compression
Fix Optimize the PDF for fast web view
Fix Remove document structure compression
Progress 53 % Progress 54 % Progress 55 % Progress 56 % Progress 57 % Progress 58 % Progress 59 % Progress 60 % Progress 61 % Progress 62 % Progress 64 % Progress 65 % Progress 66 % Progress 67 % Progress 68 % Progress 69 % Progress 70 % Progress 71 % Progress 72 % Progress 73 % Progress 74 % Progress 75 % Progress 77 % Progress 78 % Progress 79 % Progress 80 % Progress 81 % Progress 82 % Progress 83 % Progress 84 % Progress 85 % Progress 86 % Progress 87 % Progress 88 % Progress 90 % Progress 91 % Progress 92 % Progress 93 % Hit PDFA Syntax problem: Real value out of range (too low) FixFailure Convert to PDF/A-1b Progress 100 % Errors 1 Syntax problem: Real value out of range (too low) Summary Corrections 832 Summary Errors 1 Summary Warnings 0 Summary Infos 0

If you want to get these packages archive, turn of PDF_NORM, and they should archive. It just will not contain a pdf/a file generated by FDA.

lydiam commented 7 years ago

Thanks, Carol. Can you recommend other methods for us to check problem PDFs that might provide us with details that would help users to correct their PDFs. As well, do you have any suggestions for ways that users might correct their PDFs?

cchou commented 7 years ago

I redistilled the PDF with pdfapilot and it does correct the error though the resulting file size is huge. You could use preflight in Acrobat to see what kind of error it found and manually correct it accordingly. 3 height has an online tool to fix the PDF though it requires purchase. I think the PDFaPilot FDA has also include a desktop version, I remember it's more for PDF/A not PDF overall.

cchou commented 7 years ago

Also, these PDFs are not well-formed, meaning it is not structurally sound. Here are some anomaly found by JHOVE in these PDFs,

    <anomaly>Expected dictionary for font entry in page resource</anomaly>

    <anomaly>Annotation object is not a dictionary</anomaly>

    <anomaly>Outlines contain recursive references.</anomaly>
szanati commented 7 years ago

The original packages for this issue: UF00028308_03091, UF00098964_03959, and UF00028308_02603 are in: /var/daitss/ops/exceptions/tickets/GitHub_769 on darchive.