Open szanati opened 9 years ago
Looks like a problem with the XML. Without being able to see it there may be a character that needs to be escaped. Go to line 183 of the XML file.
Thanks Jon. I'll take a look at line 183.
This is what is on line 183:
I was able to use this xml and use a different pdf that was from the same source but a different package that had already archived. I just updated the xml for the lines for the name of the pdf and did a checksum on the other pdf. I was able to archive this way on the test machine. It makes me think that maybe it's the pdf and not the xml.
Yes, the error is in processing the PDF. "error while processing 1(sip-files/05-10-2015.pdf): Fatal error: EntityRef: expecting ';' at :183."
Can you send this PDF to the description service and see if you get any error? I guess there is some bad character embedded in the PDF metadata.
I did run it. Here is what I got:
From the code, https://github.com/daitss/core/blob/master/lib/daitss/proc/wip/preserve.rb#L26, it looks like it's while processing the file so it's either description service, action plan or transformation. Anything in the logs for those services relating to this file?
Basically what core does is:
There should be error logs in the those services if they encounters any problem. Or, you can call those service individually.
Thanks carol. I see if I see anything in the logs.
I did find this in the transform log for the package in question:
2015 Jun 9 09:49:13 fclnx30 Transform[9218]: INFO transform.fda.fcla.edu: location = file:/var/daitss/data/work/EPU5LDS3K_TCXIEF/files/original/1/data 2015 Jun 9 09:49:50 fclnx30 Transform[9218]: INFO transform.fda.fcla.edu: error code pid 12274 exit 1 2015 Jun 9 09:49:50 fclnx30 Transform[9218]: INFO transform.fda.fcla.edu: Rack: 192.168.36.60 - - [09/Jun/2015 09:49:50] "GET /transform/pdf_norm?location=file:/var/daitss/data/work/EPU5LDS3K_TCXIEF/files/original/1/data " 200 18288 36.3724 2015 Jun 9 09:51:39 fclnx30 Transform[9218]: INFO transform.fda.fcla.edu: Rack: 192.168.36.60 - - [09/Jun/2015 09:51:39] "GET /transform/pdf_norm " 400 38 0.0009 2015 Jun 9 09:51:39 fclnx30 Transform[9218]: INFO transform.fda.fcla.edu: location = file:/var/daitss/data/work/EPU5LDS3K_TCXIEF/files/original/1/data 2015 Jun 9 09:52:16 fclnx30 Transform[9218]: INFO transform.fda.fcla.edu: error code pid 13533 exit 1 2015 Jun 9 09:52:16 fclnx30 Transform[9218]: INFO transform.fda.fcla.edu: Rack: 192.168.36.60 - - [09/Jun/2015 09:52:16] "GET /transform/pdf_norm?location=file:/var/daitss/data/work/EPU5LDS3K_TCXIEF/files/original/1/data " 200 18289 37.0649
I had reset it and tried it again that is why it shows several errors in a short amount of time.
Can you check the description service also? These services generate metadata in XML and the error is XML related. There is probably an un-escaped ampersand in the resulting metadata from one of the services.
This is what I got from the describe log if that is what you mean by description services for the package:
2015 Jun 8 22:03:43 fclnx30 Describe[5300]: INFO describe.fda.fcla.edu: Rack: 192.168.36.60 - - [08/Jun/2015 22:03:43] "GET /describe?location=file:/var/daitss/data/work/EPU5LDS3K_TCXIEF/files/original/2/data&uri=info%3Afda%2FEPU5LDS3K_TCXIEF%2Ffile%2F2&originalName=sip-files%2FUF00028308_02603.mets HTTP/1.0" 200 3515 0.0901 2015 Jun 8 22:03:43 fclnx30 Describe[5288]: INFO describe.fda.fcla.edu: transforming JHOVE output to DocMD
2015 Jun 8 22:03:43 fclnx30 Describe[5288]: INFO describe.fda.fcla.edu: Rack: 192.168.36.60 - - [08/Jun/2015 22:03:43] "GET /describe?location=file:/var/daitss/data/work/EPU5LDS3K_TCXIEF/files/original/1/data&uri=info%3Afda%2FEPU5LDS3K_TCXIEF%2Ffile%2F1&originalName=sip-files%2F05-10-2015.pdf HTTP/1.0" 200 213993 0.7045
2015 Jun 9 09:49:13 fclnx30 Describe[5300]: INFO describe.fda.fcla.edu: Rack: 192.168.36.60 - - [09/Jun/2015 09:49:13] "GET /describe?location=file:/var/daitss/data/work/EPU5LDS3K_TCXIEF/files/original/1/data&uri=info%3Afda%2FEPU5LDS3K_TCXIEF%2Ffile%2F1&originalName=sip-files%2F05-10-2015.pdf HTTP/1.0" 200 213993 0.7857 2015 Jun 9 09:51:38 fclnx30 Describe[5288]: INFO describe.fda.fcla.edu: transforming JHOVE output to DocMD
2015 Jun 9 09:51:39 fclnx30 Describe[5288]: INFO describe.fda.fcla.edu: Rack: 192.168.36.60 - - [09/Jun/2015 09:51:39] "GET /describe?location=file:/var/daitss/data/work/EPU5LDS3K_TCXIEF/files/original/1/data&uri=info%3Afda%2FEPU5LDS3K_TCXIEF%2Ffile%2F1&originalName=sip-files%2F05-10-2015.pdf HTTP/1.0" 200 213993 0.7347
That all looks fine. I'd lean toward the issue being in Transform.
Can you check the contents of file:/var/daitss/data/work/EPU5LDS3K_TCXIEF/files/original/1/data? I think it should be in xml format. If so, look at line 183.
I see these two items in /var/daitss/data/work/EPU5LDS3K_TCXIEF/files/original/1/ data metadata/ When I try to cat data its a bunch of items I can not read. I believe data is the pdf. Now in metadata there is: aip-path describe-agent describe-bitstream-objects describe-event describe-file-object sip-path virus-check-agent virus-check-event
You can look at those but my fear is that the XML is not being created/saved. Looking back at the previous issues it could be something similar to a fix Carol put in Describe:
This may require some debugging to get the resulting metadata from Transform and putting in a fix. Carol might have a better feel for the issue. It seems like there should be better logging for Transform.
Thanks for all your help.
Yes, it's most likely the metadata. You can send the XML returned from the describ to an XML validator to find the offending metadata. Unfortunately I don't have server access any more so it would be difficult for me to trouble shoot.
Where is the xml that is returned from the describ located?
core break up the returned XML into describe-agent describe-bitstream-objects describe-event describe-file-object.
The easiest way is to send the PDF to description.fcla.edu to get the XML. However, please note that this bug will most likely need a code fix.
Thanks for all your help Carol.
I received another one of these errors:
error while processing 1(sip-files/03-13-2016.pdf): Fatal error: EntityRef: expecting ';' at :67.
trace
/opt/web-services/sites/core/releases/20141117161651/lib/daitss/proc/wip/preserve.rb:26:in rescue in block in preserve' /opt/web-services/sites/core/releases/20141117161651/lib/daitss/proc/wip/preserve.rb:19:in
block in preserve'
/opt/web-services/sites/core/releases/20141117161651/lib/daitss/proc/wip/preserve.rb:18:in each' /opt/web-services/sites/core/releases/20141117161651/lib/daitss/proc/wip/preserve.rb:18:in
preserve'
/opt/web-services/sites/core/releases/20141117161651/lib/daitss/proc/wip/ingest.rb:33:in ingest' /opt/web-services/sites/core/releases/20141117161651/lib/daitss/proc/wip/process.rb:82:in
block in spawn'
/opt/web-services/sites/core/releases/20141117161651/lib/daitss/proc/wip/process.rb:66:in fork' /opt/web-services/sites/core/releases/20141117161651/lib/daitss/proc/wip/process.rb:66:in
spawn'
/opt/web-services/sites/core/current/bin/pulse:161:in block in start_wips' /opt/web-services/sites/core/current/bin/pulse:158:in
each'
/opt/web-services/sites/core/current/bin/pulse:158:in start_wips' /opt/web-services/sites/core/current/bin/pulse:194:in
block in
'
/opt/web-services/sites/core/current/bin/pulse:192:in loop' /opt/web-services/sites/core/current/bin/pulse:192:in
'
I sent the pdf to the description services and got:
The ieid are as follows: EN49CK9DL_YMA17R, EPU5LDS3K_TCXIEF, EMNT2DRN2_UEZLFW, EXGHFQQND_VIPEE0, and EZYQ0HHD9_97E59T for package UF00028308_02603. EBC275L56_4UR7OW for package UF00098964_03959.
Do we know what the exact problem is in this issue? Is it a description service bug?
Stephen, can you put the offending PDF on the daitss-test?
Carol, there are 3 different pdfs for 3 packages that have the similar issue to the original issue. The pdfs are as follows: 05-10-2015.pdf is the pdf for the original issue. The other 2 are 03-13-2016.pdf and 04-30-2017.pdf. I have copied them all to the daitss-test site.
The description service has not problem processing these PDFs. Actionplan is OK too. I tried the package UF00028308_02603, it archived with pdfapilot turn off. Once I turn on the pdfapilot, core fails with the same error. Looks like the problem lies in core cannot process the xml returned from transform service with error generated by pdfapilot, particularly this one which contain special character: Fix Repair invalid ToUnicode CMap information in fonts MfYoung&Beautiful
It appears that this PDF have error that Callas cannot fix to convert it into PDF/A. I will reports those PDFs to Callas. However, we may want to have UF fix those files since it includes a lot of unembedded special fonts.
Pages 19
PDFA Regular
Progress 6 %
Fix Set values to implementation limits of PDF/A if possible
Progress 15 %
Progress 16 %
Fix Force blend color space to sRGB
Fix Prepare annotations for PDF/A-1
Progress 17 %
Progress 18 %
Progress 19 %
Progress 20 %
Progress 21 %
Progress 22 %
Progress 23 %
Progress 24 %
Progress 25 %
Progress 26 %
Progress 27 %
Progress 28 %
Progress 29 %
Progress 30 %
Fix Substitute characters using .notdef glyph with space characters
Progress 31 %
Progress 32 %
Progress 33 %
Progress 34 %
Progress 35 %
Progress 36 %
Progress 37 %
Fix Convert SMask to image mask
Fix Repair invalid ToUnicode CMap information in fonts MyriadPro-Bold
Fix Repair invalid ToUnicode CMap information in fonts MyriadPro-Black
Fix Repair invalid ToUnicode CMap information in fonts Utopia-Bold
Fix Repair invalid ToUnicode CMap information in fonts CenturyOldStyle-Regular
Fix Repair invalid ToUnicode CMap information in fonts MyriadPro-Regular
Fix Repair invalid ToUnicode CMap information in fonts Utopia-Semibold
Fix Repair invalid ToUnicode CMap information in fonts CaslonTwoTwentyFour-Black
Fix Repair invalid ToUnicode CMap information in fonts FranklinGothic-BookOblique
Fix Repair invalid ToUnicode CMap information in fonts MyriadPro-BoldCond
Fix Repair invalid ToUnicode CMap information in fonts MyriadPro-BoldCond2
Fix Repair invalid ToUnicode CMap information in fonts HelveticaNeueLTStd-XBlkCn
Fix Repair invalid ToUnicode CMap information in fonts MyriadPro-Cond
Fix Repair invalid ToUnicode CMap information in fonts MyriadPro-Cond2
Fix Remove additional encoding entries in cmap of symbolic TrueType fonts Helvetica
Fix Repair invalid ToUnicode CMap information in fonts Helvetica
Fix Add missing SPACE glyphs FranklinGothic-BookOblique2
Fix Repair invalid ToUnicode CMap information in fonts CenturyOldStyle-Italic
Fix Repair invalid ToUnicode CMap information in fonts Utopia-Regular
Fix Remove additional encoding entries in cmap of symbolic TrueType fonts Helvetica2
Fix Repair invalid ToUnicode CMap information in fonts Helvetica2
Fix Add missing SPACE glyphs MyriadPro-Black2
Fix Remove additional encoding entries in cmap of symbolic TrueType fonts ArialNarrow
Fix Repair invalid ToUnicode CMap information in fonts ArialNarrow
Fix Repair invalid ToUnicode CMap information in fonts GillSans-Bold
Fix Repair invalid ToUnicode CMap information in fonts CenturyGothic
Fix Repair invalid ToUnicode CMap information in fonts AvenirLTStd-Medium
Fix Repair invalid ToUnicode CMap information in fonts Helvetica3
Fix Add missing SPACE glyphs AvenirLTStd-Light
Fix Repair invalid ToUnicode CMap information in fonts AvenirLTStd-Black
Fix Repair invalid ToUnicode CMap information in fonts AvenirLTStd-Light2
Fix Repair invalid ToUnicode CMap information in fonts AvenirLTStd-Heavy
Fix Repair invalid ToUnicode CMap information in fonts BradleyHandITCTTBold
Fix Repair invalid ToUnicode CMap information in fonts CenturyGothic2
Fix Remove additional encoding entries in cmap of symbolic TrueType fonts ArialNarrow-Italic
Fix Repair invalid ToUnicode CMap information in fonts ArialNarrow-Italic
Fix Remove additional encoding entries in cmap of symbolic TrueType fonts ArialNarrow-BoldItalic
Fix Repair invalid ToUnicode CMap information in fonts ArialNarrow-BoldItalic
Fix Remove additional encoding entries in cmap of symbolic TrueType fonts ArialNarrow2
Fix Repair invalid ToUnicode CMap information in fonts ArialNarrow2
Fix Remove additional encoding entries in cmap of symbolic TrueType fonts ArialNarrow-Bold
Fix Repair invalid ToUnicode CMap information in fonts ArialNarrow-Bold
Fix Repair invalid ToUnicode CMap information in fonts ZapfDingbatsITC
Fix Repair invalid ToUnicode CMap information in fonts ZapfDingbatsITC2
Fix Repair invalid ToUnicode CMap information in fonts MyriadPro-Semibold
Fix Remove additional encoding entries in cmap of symbolic TrueType fonts TimesNewRomanPS-BoldMT
Fix Repair invalid ToUnicode CMap information in fonts TimesNewRomanPS-BoldMT
Fix Repair invalid ToUnicode CMap information in fonts MinionPro-Medium
Fix Add missing SPACE glyphs MinionPro-Bold
Fix Remove additional encoding entries in cmap of symbolic TrueType fonts ArialMT
Fix Repair invalid ToUnicode CMap information in fonts ArialMT
Fix Remove additional encoding entries in cmap of symbolic TrueType fonts TimesNewRomanPSMT
Fix Repair invalid ToUnicode CMap information in fonts TimesNewRomanPSMT
Fix Remove additional encoding entries in cmap of symbolic TrueType fonts Times-Bold
Fix Repair invalid ToUnicode CMap information in fonts Times-Bold
Fix Remove additional encoding entries in cmap of symbolic TrueType fonts Times-Roman
Fix Repair invalid ToUnicode CMap information in fonts Times-Roman
Fix Repair invalid ToUnicode CMap information in fonts Times-Roman2
Fix Remove additional encoding entries in cmap of symbolic TrueType fonts HelveticaNeue-Italic
Fix Repair invalid ToUnicode CMap information in fonts HelveticaNeue-Italic
Fix Repair invalid ToUnicode CMap information in fonts KeplerStd-Bold
Fix Repair invalid ToUnicode CMap information in fonts KeplerStd-Bold2
Fix Repair invalid ToUnicode CMap information in fonts MyriadPro-Semibold2
Fix Repair invalid ToUnicode CMap information in fonts FranklinGothic-Demi2
Fix Add missing SPACE glyphs GillSans-LightItalic
Fix Repair invalid ToUnicode CMap information in fonts KeplerStd-Regular
Fix Repair invalid ToUnicode CMap information in fonts Impact
Fix Add missing SPACE glyphs GillSans-Light
Fix Repair invalid ToUnicode CMap information in fonts AGaramondPro-Semibold
Fix Remove additional encoding entries in cmap of symbolic TrueType fonts ArialNarrow3
Fix Repair invalid ToUnicode CMap information in fonts ArialNarrow3
Fix Remove additional encoding entries in cmap of symbolic TrueType fonts ArialNarrow-Bold2
Fix Repair invalid ToUnicode CMap information in fonts ArialNarrow-Bold2
Fix Remove additional encoding entries in cmap of symbolic TrueType fonts TimesNewRomanPSMT2
Fix Repair invalid ToUnicode CMap information in fonts TimesNewRomanPSMT2
Fix Add missing SPACE glyphs MyriadPro-Bold3
Fix Add missing SPACE glyphs Helvetica-Bold2
Fix Remove additional encoding entries in cmap of symbolic TrueType fonts Helvetica4
Fix Repair invalid ToUnicode CMap information in fonts Helvetica4
Fix Remove additional encoding entries in cmap of symbolic TrueType fonts TimesNewRomanPSMT3
Fix Repair invalid ToUnicode CMap information in fonts TimesNewRomanPSMT3
Fix Repair invalid ToUnicode CMap information in fonts TimesNewRomanPSMT4
Fix Remove additional encoding entries in cmap of symbolic TrueType fonts ArialNarrow-Bold3
Fix Repair invalid ToUnicode CMap information in fonts ArialNarrow-Bold3
Fix Remove additional encoding entries in cmap of symbolic TrueType fonts ArialNarrow4
Fix Repair invalid ToUnicode CMap information in fonts ArialNarrow4
Fix Add missing SPACE glyphs GillSans-LightItalic2
Fix Repair invalid ToUnicode CMap information in fonts GillSans-LightItalic2
Fix Repair invalid ToUnicode CMap information in fonts GillSans-Light2
Fix Repair invalid ToUnicode CMap information in fonts Utopia-Bold3
Fix Add missing SPACE glyphs Helvetica-Bold3
Fix Repair invalid ToUnicode CMap information in fonts Helvetica-Bold3
Fix Repair invalid ToUnicode CMap information in fonts CenturyOldStyle-Regular3
Fix Repair invalid ToUnicode CMap information in fonts GillSans-Bold3
Fix Repair invalid ToUnicode CMap information in fonts ZapfDingbatsITC3
Fix Repair invalid ToUnicode CMap information in fonts GillSans3
Fix Add missing SPACE glyphs AntiqueOlive-Bold2
Fix Repair invalid ToUnicode CMap information in fonts AntiqueOlive-Bold2
Fix Repair invalid ToUnicode CMap information in fonts ArialNarrow-Bold4
Fix Repair invalid ToUnicode CMap information in fonts ArialNarrow5
Fix Repair invalid ToUnicode CMap information in fonts Helvetica5
Fix Repair invalid ToUnicode CMap information in fonts MfYoung&Beautiful
Fix Repair invalid ToUnicode CMap information in fonts Centaur
Fix Repair invalid ToUnicode CMap information in fonts MyriadPro-Bold4
Fix Repair invalid ToUnicode CMap information in fonts Interstate-Black
Fix Repair invalid ToUnicode CMap information in fonts MyriadPro-Regular5
Fix Repair invalid ToUnicode CMap information in fonts Interstate-Light
Fix Repair invalid ToUnicode CMap information in fonts TimesNewRomanPS2
Fix Add missing SPACE glyphs MyriadPro-Regular6
Fix Repair invalid ToUnicode CMap information in fonts MyriadPro-Cond4
Fix Repair invalid ToUnicode CMap information in fonts MyriadPro-Semibold4
Fix Repair invalid ToUnicode CMap information in fonts MyriadPro-Cond5
Fix Repair invalid ToUnicode CMap information in fonts MyriadPro-It4
Fix Add missing SPACE glyphs Syntax-UltraBlack
Fix Repair invalid ToUnicode CMap information in fonts Syntax-UltraBlack
Fix Repair invalid ToUnicode CMap information in fonts Gotham-Black
Fix Repair invalid ToUnicode CMap information in fonts CaslonFiveForty-Italic
Fix Repair invalid ToUnicode CMap information in fonts Gotham-Bold
Fix Repair invalid ToUnicode CMap information in fonts CaslonTwoTwentyFour-Book
Fix Repair invalid ToUnicode CMap information in fonts LietzLindauHamburg
Fix Repair invalid ToUnicode CMap information in fonts Syntax-Roman
Fix Add missing SPACE glyphs Gotham-Ultra
Fix Repair invalid ToUnicode CMap information in fonts Gotham-Ultra
Fix Repair invalid ToUnicode CMap information in fonts PTF-NORDIC-Round
Fix Repair invalid ToUnicode CMap information in fonts Impact2
Fix Repair invalid ToUnicode CMap information in fonts Gotham-Book
Fix Repair invalid ToUnicode CMap information in fonts Times-Bold2
Fix Repair invalid ToUnicode CMap information in fonts MyriadPro-Bold6
Fix Add missing SPACE glyphs AntiqueOlive-Bold3
Fix Repair invalid ToUnicode CMap information in fonts Helvetica-Bold4
Fix Repair invalid ToUnicode CMap information in fonts Times-Roman4
Fix Repair invalid ToUnicode CMap information in fonts MyriadPro-Bold7
Fix Repair invalid ToUnicode CMap information in fonts MyriadPro-BoldCond6
Fix Repair invalid ToUnicode CMap information in fonts Myriad-Bold
Fix Repair invalid ToUnicode CMap information in fonts MyriadPro-Semibold5
Fix Repair invalid ToUnicode CMap information in fonts MyriadPro-Cond6
Fix Add missing SPACE glyphs Wingdings
Fix Repair invalid ToUnicode CMap information in fonts CaslonThree-Roman2
Fix Repair invalid ToUnicode CMap information in fonts TimesNewRomanPS-BoldMT2
Fix Repair invalid ToUnicode CMap information in fonts HelveticaNeue-CondensedBold
Fix Repair invalid ToUnicode CMap information in fonts TimesNewRomanPSMT5
Fix Repair invalid ToUnicode CMap information in fonts Helvetica-Bold5
Fix Repair invalid ToUnicode CMap information in fonts HelveticaNeue-CondensedBlack
Fix Repair invalid ToUnicode CMap information in fonts MyriadPro-Regular7
Fix Repair invalid ToUnicode CMap information in fonts TimesNewRomanPS-ItalicMT
Fix Add missing SPACE glyphs GillSans-LightItalic3
Fix Repair invalid ToUnicode CMap information in fonts GillSans-LightItalic3
Fix Repair invalid ToUnicode CMap information in fonts GillSans-Light3
Fix Repair invalid ToUnicode CMap information in fonts TimesNewRomanPS-BoldItalicMT
Fix Add missing SPACE glyphs Helvetica-Bold7
Fix Remove additional encoding entries in cmap of symbolic TrueType fonts Helvetica8
Fix Repair invalid ToUnicode CMap information in fonts Helvetica8
Fix Repair invalid ToUnicode CMap information in fonts ZapfDingbatsITC4
Fix Repair invalid ToUnicode CMap information in fonts GillSans4
Fix Add missing SPACE glyphs Helvetica-Bold8
Fix Remove additional encoding entries in cmap of symbolic TrueType fonts Helvetica-Bold8
Fix Repair invalid ToUnicode CMap information in fonts Helvetica-Bold8
Fix Add missing SPACE glyphs TMSsymbols
Fix Fix font encoding (CIDToGIDMap)
Fix Fix font encoding (CharSet)
Fix Fix font encoding (CIDSet)
Fix Adjust colors for PDF based ISO standards
Fix Insert missing Type entry in StructElem objects
Fix Flatten transparency (high resolution)
Fix Make document XMP Metadata compliant with PDF/A-1
Fix Remove all xmpMM:Manifest metadata entries
Fix Remove XMP Metadata if not compliant with PDF/A
Fix Repair invalid bookmark hierarchies
Progress 48 %
Progress 49 %
Progress 50 %
Fix Compress all uncompressed objects using lossless ZIP compression
Fix Optimize the PDF for fast web view
Fix Remove document structure compression
Progress 53 %
Progress 54 %
Progress 55 %
Progress 56 %
Progress 57 %
Progress 58 %
Progress 59 %
Progress 60 %
Progress 61 %
Progress 62 %
Progress 64 %
Progress 65 %
Progress 66 %
Progress 67 %
Progress 68 %
Progress 69 %
Progress 70 %
Progress 71 %
Progress 72 %
Progress 73 %
Progress 74 %
Progress 75 %
Progress 77 %
Progress 78 %
Progress 79 %
Progress 80 %
Progress 81 %
Progress 82 %
Progress 83 %
Progress 84 %
Progress 85 %
Progress 86 %
Progress 87 %
Progress 88 %
Progress 90 %
Progress 91 %
Progress 92 %
Progress 93 %
Hit PDFA Syntax problem: Real value out of range (too low)
FixFailure Convert to PDF/A-1b
Progress 100 %
Errors 1 Syntax problem: Real value out of range (too low)
Summary Corrections 832
Summary Errors 1
Summary Warnings 0
Summary Infos 0
If you want to get these packages archive, turn of PDF_NORM, and they should archive. It just will not contain a pdf/a file generated by FDA.
Thanks, Carol. Can you recommend other methods for us to check problem PDFs that might provide us with details that would help users to correct their PDFs. As well, do you have any suggestions for ways that users might correct their PDFs?
I redistilled the PDF with pdfapilot and it does correct the error though the resulting file size is huge. You could use preflight in Acrobat to see what kind of error it found and manually correct it accordingly. 3 height has an online tool to fix the PDF though it requires purchase. I think the PDFaPilot FDA has also include a desktop version, I remember it's more for PDF/A not PDF overall.
Also, these PDFs are not well-formed, meaning it is not structurally sound. Here are some anomaly found by JHOVE in these PDFs,
<anomaly>Expected dictionary for font entry in page resource</anomaly>
<anomaly>Annotation object is not a dictionary</anomaly>
<anomaly>Outlines contain recursive references.</anomaly>
The original packages for this issue: UF00028308_03091, UF00098964_03959, and UF00028308_02603 are in: /var/daitss/ops/exceptions/tickets/GitHub_769 on darchive.
The IEID for the below error is EZYQ0HHD9_97E59T
I received the following error while tying to ingest a package that contained only the pdf, xml and mets file:
error while processing 1(sip-files/05-10-2015.pdf): Fatal error: EntityRef: expecting ';' at :183.
/opt/web-services/sites/core/releases/20141117161651/lib/daitss/proc/wip/preserve.rb:26:in
rescue in block in preserve' /opt/web-services/sites/core/releases/20141117161651/lib/daitss/proc/wip/preserve.rb:19:in
block in preserve' /opt/web-services/sites/core/releases/20141117161651/lib/daitss/proc/wip/preserve.rb:18:ineach' /opt/web-services/sites/core/releases/20141117161651/lib/daitss/proc/wip/preserve.rb:18:in
preserve' /opt/web-services/sites/core/releases/20141117161651/lib/daitss/proc/wip/ingest.rb:33:iningest' /opt/web-services/sites/core/releases/20141117161651/lib/daitss/proc/wip/process.rb:82:in
block in spawn' /opt/web-services/sites/core/releases/20141117161651/lib/daitss/proc/wip/process.rb:66:infork' /opt/web-services/sites/core/releases/20141117161651/lib/daitss/proc/wip/process.rb:66:in
spawn' /opt/web-services/sites/core/current/bin/pulse:161:inblock in start_wips' /opt/web-services/sites/core/current/bin/pulse:158:in
each' /opt/web-services/sites/core/current/bin/pulse:158:instart_wips' /opt/web-services/sites/core/current/bin/pulse:194:in
block in ' /opt/web-services/sites/core/current/bin/pulse:192:in `loop' /opt/web-services/sites/core/current/bin/pulse:192:in