Closed szanati closed 7 years ago
When I submit the above pdf, fsu_168965.pdf, to the description services I get the following error:
running into exception running into exception StackOverflowError 'unknown exception' while processing 24 bytes of input /var/www/description/describe/lib/format/formatbase.rb:34:in method_missing'\n/var/www/description/describe/lib/format/formatbase.rb:34:in
extract'\n/var/www/description/describe/lib/formatpool.rb:79:in send'\n/var/www/description/describe/lib/formatpool.rb:79:in
extractAll'\n/var/www/description/describe/lib/formatpool.rb:71:in each'\n/var/www/description/describe/lib/formatpool.rb:71:in
extractAll'\n/var/www/description/describe/lib/formatpool.rb:34:in describe'\n./app.rb:205:in
POST /describe'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:1540:in call'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:1540:in
compile!'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:950:in []'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:950:in
route!'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:966:in route_eval'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:950:in
route!'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:987:in process_route'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:985:in
catch'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:985:in process_route'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:948:in
route!'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:947:in each'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:947:in
route!'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:1059:in dispatch!'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:1041:in
invoke'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:1041:in catch'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:1041:in
invoke'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:1056:in dispatch!'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:882:in
call!'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:1041:in invoke'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:1041:in
catch'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:1041:in invoke'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:882:in
call!'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:870:in call'\n/var/www/description/describe/bundle/ruby/1.8/gems/rack-1.5.2/lib/rack/commonlogger.rb:33:in
call_without_check'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:212:in call'\n/var/www/description/describe/bundle/ruby/1.8/gems/rack-protection-1.5.0/lib/rack/protection/xss_header.rb:18:in
call'\n/var/www/description/describe/bundle/ruby/1.8/gems/rack-protection-1.5.0/lib/rack/protection/path_traversal.rb:16:in call'\n/var/www/description/describe/bundle/ruby/1.8/gems/rack-protection-1.5.0/lib/rack/protection/json_csrf.rb:18:in
call'\n/var/www/description/describe/bundle/ruby/1.8/gems/rack-protection-1.5.0/lib/rack/protection/base.rb:49:in call'\n/var/www/description/describe/bundle/ruby/1.8/gems/rack-protection-1.5.0/lib/rack/protection/frame_options.rb:31:in
call'\n/var/www/description/describe/bundle/ruby/1.8/gems/rack-1.5.2/lib/rack/nulllogger.rb:9:in call'\n/var/www/description/describe/bundle/ruby/1.8/gems/rack-1.5.2/lib/rack/head.rb:11:in
call'\n/var/www/description/describe/bundle/ruby/1.8/gems/rack-1.5.2/lib/rack/methodoverride.rb:21:in call'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:175:in
call'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:1949:in call'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:1449:in
call'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:1726:in synchronize'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:1449:in
call'\n/var/www/description/describe/bundle/ruby/1.8/gems/thin-1.5.1/lib/thin/connection.rb:81:in pre_process'\n/var/www/description/describe/bundle/ruby/1.8/gems/thin-1.5.1/lib/thin/connection.rb:79:in
catch'\n/var/www/description/describe/bundle/ruby/1.8/gems/thin-1.5.1/lib/thin/connection.rb:79:in pre_process'\n/var/www/description/describe/bundle/ruby/1.8/gems/thin-1.5.1/lib/thin/connection.rb:54:in
process'\n/var/www/description/describe/bundle/ruby/1.8/gems/thin-1.5.1/lib/thin/connection.rb:39:in receive_data'\n/var/www/description/describe/bundle/ruby/1.8/gems/eventmachine-1.0.3/lib/eventmachine.rb:187:in
run_machine'\n/var/www/description/describe/bundle/ruby/1.8/gems/eventmachine-1.0.3/lib/eventmachine.rb:187:in run'\n/var/www/description/describe/bundle/ruby/1.8/gems/thin-1.5.1/lib/thin/backends/base.rb:63:in
start'\n/var/www/description/describe/bundle/ruby/1.8/gems/thin-1.5.1/lib/thin/server.rb:159:in start'\n/var/www/description/describe/bundle/ruby/1.8/gems/thin-1.5.1/lib/thin/controllers/controller.rb:86:in
start'\n/var/www/description/describe/bundle/ruby/1.8/gems/thin-1.5.1/lib/thin/runner.rb:187:in send'\n/var/www/description/describe/bundle/ruby/1.8/gems/thin-1.5.1/lib/thin/runner.rb:187:in
run_command'\n/var/www/description/describe/bundle/ruby/1.8/gems/thin-1.5.1/lib/thin/runner.rb:152:in run!'\n/var/www/description/describe/bundle/ruby/1.8/gems/thin-1.5.1/bin/thin:6\n/local/mongrel/.gem/ruby/1.8/gems/bin/thin:19:in
load'\n/local/mongrel/.gem/ruby/1.8/gems/bin/thin:19 while processing fsu_168965.pdf /var/www/description/describe/lib/format/formatbase.rb:56:in extract'\n/var/www/description/describe/lib/formatpool.rb:79:in
send'\n/var/www/description/describe/lib/formatpool.rb:79:in extractAll'\n/var/www/description/describe/lib/formatpool.rb:71:in
each'\n/var/www/description/describe/lib/formatpool.rb:71:in extractAll'\n/var/www/description/describe/lib/formatpool.rb:34:in
describe'\n./app.rb:205:in POST /describe'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:1540:in
call'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:1540:in compile!'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:950:in
[]'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:950:in route!'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:966:in
route_eval'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:950:in route!'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:987:in
process_route'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:985:in catch'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:985:in
process_route'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:948:in route!'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:947:in
each'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:947:in route!'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:1059:in
dispatch!'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:1041:in invoke'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:1041:in
catch'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:1041:in invoke'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:1056:in
dispatch!'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:882:in call!'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:1041:in
invoke'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:1041:in catch'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:1041:in
invoke'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:882:in call!'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:870:in
call'\n/var/www/description/describe/bundle/ruby/1.8/gems/rack-1.5.2/lib/rack/commonlogger.rb:33:in call_without_check'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:212:in
call'\n/var/www/description/describe/bundle/ruby/1.8/gems/rack-protection-1.5.0/lib/rack/protection/xss_header.rb:18:in call'\n/var/www/description/describe/bundle/ruby/1.8/gems/rack-protection-1.5.0/lib/rack/protection/path_traversal.rb:16:in
call'\n/var/www/description/describe/bundle/ruby/1.8/gems/rack-protection-1.5.0/lib/rack/protection/json_csrf.rb:18:in call'\n/var/www/description/describe/bundle/ruby/1.8/gems/rack-protection-1.5.0/lib/rack/protection/base.rb:49:in
call'\n/var/www/description/describe/bundle/ruby/1.8/gems/rack-protection-1.5.0/lib/rack/protection/frame_options.rb:31:in call'\n/var/www/description/describe/bundle/ruby/1.8/gems/rack-1.5.2/lib/rack/nulllogger.rb:9:in
call'\n/var/www/description/describe/bundle/ruby/1.8/gems/rack-1.5.2/lib/rack/head.rb:11:in call'\n/var/www/description/describe/bundle/ruby/1.8/gems/rack-1.5.2/lib/rack/methodoverride.rb:21:in
call'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:175:in call'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:1949:in
call'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:1449:in call'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:1726:in
synchronize'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:1449:in call'\n/var/www/description/describe/bundle/ruby/1.8/gems/thin-1.5.1/lib/thin/connection.rb:81:in
pre_process'\n/var/www/description/describe/bundle/ruby/1.8/gems/thin-1.5.1/lib/thin/connection.rb:79:in catch'\n/var/www/description/describe/bundle/ruby/1.8/gems/thin-1.5.1/lib/thin/connection.rb:79:in
pre_process'\n/var/www/description/describe/bundle/ruby/1.8/gems/thin-1.5.1/lib/thin/connection.rb:54:in process'\n/var/www/description/describe/bundle/ruby/1.8/gems/thin-1.5.1/lib/thin/connection.rb:39:in
receive_data'\n/var/www/description/describe/bundle/ruby/1.8/gems/eventmachine-1.0.3/lib/eventmachine.rb:187:in run_machine'\n/var/www/description/describe/bundle/ruby/1.8/gems/eventmachine-1.0.3/lib/eventmachine.rb:187:in
run'\n/var/www/description/describe/bundle/ruby/1.8/gems/thin-1.5.1/lib/thin/backends/base.rb:63:in start'\n/var/www/description/describe/bundle/ruby/1.8/gems/thin-1.5.1/lib/thin/server.rb:159:in
start'\n/var/www/description/describe/bundle/ruby/1.8/gems/thin-1.5.1/lib/thin/controllers/controller.rb:86:in start'\n/var/www/description/describe/bundle/ruby/1.8/gems/thin-1.5.1/lib/thin/runner.rb:187:in
send'\n/var/www/description/describe/bundle/ruby/1.8/gems/thin-1.5.1/lib/thin/runner.rb:187:in run_command'\n/var/www/description/describe/bundle/ruby/1.8/gems/thin-1.5.1/lib/thin/runner.rb:152:in
run!'\n/var/www/description/describe/bundle/ruby/1.8/gems/thin-1.5.1/bin/thin:6\n/local/mongrel/.gem/ruby/1.8/gems/bin/thin:19:in `load'\n/local/mongrel/.gem/ruby/1.8/gems/bin/thin:19.
This PDF crash JHOVE, the format validation tool (developed by Harvard) DAITSS use to validate and extract metadata from PDF. Here is the JHOVE output,
MAC-CChou:jhove cchou$ ./jhove -c conf/jhove.conf -m PDF-hul ~/Downloads/fsu_168965.pdf Exception in thread "main" java.lang.StackOverflowError at java.lang.StringCoding$StringEncoder.encode(StringCoding.java:304) at java.lang.StringCoding.encode(StringCoding.java:344) at java.lang.StringCoding.encode(StringCoding.java:387) at java.lang.String.getBytes(String.java:958) at edu.harvard.hul.ois.jhove.module.pdf.Name.isPdfACompliant(Name.java:22) at edu.harvard.hul.ois.jhove.module.pdf.Tokenizer.getNext(Tokenizer.java:472) at edu.harvard.hul.ois.jhove.module.pdf.Parser.getNext(Parser.java:93) at edu.harvard.hul.ois.jhove.module.pdf.Parser.getNext(Parser.java:82) at edu.harvard.hul.ois.jhove.module.pdf.Parser.readObject(Parser.java:278) at edu.harvard.hul.ois.jhove.module.pdf.Parser.readArray(Parser.java:310) at edu.harvard.hul.ois.jhove.module.pdf.Parser.readObject(Parser.java:280) at edu.harvard.hul.ois.jhove.module.pdf.Parser.readDictionary(Parser.java:346) at edu.harvard.hul.ois.jhove.module.pdf.Parser.readObject(Parser.java:283) at edu.harvard.hul.ois.jhove.module.pdf.Parser.readObjectDef(Parser.java:234) at edu.harvard.hul.ois.jhove.module.pdf.Parser.readObjectDef(Parser.java:210) at edu.harvard.hul.ois.jhove.module.PdfModule.getObject(PdfModule.java:2454) at edu.harvard.hul.ois.jhove.module.PdfModule.resolveIndirectObject(PdfModule.java:2383) at edu.harvard.hul.ois.jhove.module.pdf.AProfile.checkOutlineItem(AProfile.java:878) at edu.harvard.hul.ois.jhove.module.pdf.AProfile.checkOutlineItem(AProfile.java:880) at edu.harvard.hul.ois.jhove.module.pdf.AProfile.checkOutlineItem(AProfile.java:880) at edu.harvard.hul.ois.jhove.module.pdf.AProfile.checkOutlineItem(AProfile.java:880) at edu.harvard.hul.ois.jhove.module.pdf.AProfile.checkOutlineItem(AProfile.java:880) at edu.harvard.hul.ois.jhove.module.pdf.AProfile.checkOutlineItem(AProfile.java:880) at edu.harvard.hul.ois.jhove.module.pdf.AProfile.checkOutlineItem(AProfile.java:880) at edu.harvard.hul.ois.jhove.module.pdf.AProfile.checkOutlineItem(AProfile.java:880) at edu.harvard.hul.ois.jhove.module.pdf.AProfile.checkOutlineItem(AProfile.java:880) at edu.harvard.hul.ois.jhove.module.pdf.AProfile.checkOutlineItem(AProfile.java:880) at edu.harvard.hul.ois.jhove.module.pdf.AProfile.checkOutlineItem(AProfile.java:880) at edu.harvard.hul.ois.jhove.module.pdf.AProfile.checkOutlineItem(AProfile.java:880) at edu.harvard.hul.ois.jhove.module.pdf.AProfile.checkOutlineItem(AProfile.java:880) at edu.harvard.hul.ois.jhove.module.pdf.AProfile.checkOutlineItem(AProfile.java:880)
There were 2 more packages that had 1pdf per package with the same error as above. They are fsu_175829.pdf and fsu_182240.pdf. FSU has more packages to come that will probably have pdfs with the same error. Can you look more into this.
Carol,
What’s the next step? Notify Harvard? Do we know if the problem is in the PDF itself or in JHOVE? (I would imagine that no file should crash JHOVE). We need to let FSU know what to expect.
Thanks,
Lydia
From: Carol Chou [mailto:notifications@github.com] Sent: Monday, April 03, 2017 10:01 PM To: daitss/core Cc: Subscribed Subject: Re: [daitss/core] PDF error while processing 51(sip-files/fsu_168965.pdf): bad status (#789)
This PDF crash JHOVE, the format validation tool (developed by Harvard) DAITSS use to validate and extract metadata from PDF. Here is the JHOVE output,
MAC-CChou:jhove cchou$ ./jhove -c conf/jhove.conf -m PDF-hul ~/Downloads/fsu_168965.pdf Exception in thread "main" java.lang.StackOverflowError at java.lang.StringCoding$StringEncoder.encode(StringCoding.java:304) at java.lang.StringCoding.encode(StringCoding.java:344) at java.lang.StringCoding.encode(StringCoding.java:387) at java.lang.String.getBytes(String.java:958) at edu.harvard.hul.ois.jhove.module.pdf.Name.isPdfACompliant(Name.java:22) at edu.harvard.hul.ois.jhove.module.pdf.Tokenizer.getNext(Tokenizer.java:472) at edu.harvard.hul.ois.jhove.module.pdf.Parser.getNext(Parser.java:93) at edu.harvard.hul.ois.jhove.module.pdf.Parser.getNext(Parser.java:82) at edu.harvard.hul.ois.jhove.module.pdf.Parser.readObject(Parser.java:278) at edu.harvard.hul.ois.jhove.module.pdf.Parser.readArray(Parser.java:310) at edu.harvard.hul.ois.jhove.module.pdf.Parser.readObject(Parser.java:280) at edu.harvard.hul.ois.jhove.module.pdf.Parser.readDictionary(Parser.java:346) at edu.harvard.hul.ois.jhove.module.pdf.Parser.readObject(Parser.java:283) at edu.harvard.hul.ois.jhove.module.pdf.Parser.readObjectDef(Parser.java:234) at edu.harvard.hul.ois.jhove.module.pdf.Parser.readObjectDef(Parser.java:210) at edu.harvard.hul.ois.jhove.module.PdfModule.getObject(PdfModule.java:2454) at edu.harvard.hul.ois.jhove.module.PdfModule.resolveIndirectObject(PdfModule.java:2383) at edu.harvard.hul.ois.jhove.module.pdf.AProfile.checkOutlineItem(AProfile.java:878) at edu.harvard.hul.ois.jhove.module.pdf.AProfile.checkOutlineItem(AProfile.java:880) at edu.harvard.hul.ois.jhove.module.pdf.AProfile.checkOutlineItem(AProfile.java:880) at edu.harvard.hul.ois.jhove.module.pdf.AProfile.checkOutlineItem(AProfile.java:880) at edu.harvard.hul.ois.jhove.module.pdf.AProfile.checkOutlineItem(AProfile.java:880) at edu.harvard.hul.ois.jhove.module.pdf.AProfile.checkOutlineItem(AProfile.java:880) at edu.harvard.hul.ois.jhove.module.pdf.AProfile.checkOutlineItem(AProfile.java:880) at edu.harvard.hul.ois.jhove.module.pdf.AProfile.checkOutlineItem(AProfile.java:880) at edu.harvard.hul.ois.jhove.module.pdf.AProfile.checkOutlineItem(AProfile.java:880) at edu.harvard.hul.ois.jhove.module.pdf.AProfile.checkOutlineItem(AProfile.java:880) at edu.harvard.hul.ois.jhove.module.pdf.AProfile.checkOutlineItem(AProfile.java:880) at edu.harvard.hul.ois.jhove.module.pdf.AProfile.checkOutlineItem(AProfile.java:880) at edu.harvard.hul.ois.jhove.module.pdf.AProfile.checkOutlineItem(AProfile.java:880) at edu.harvard.hul.ois.jhove.module.pdf.AProfile.checkOutlineItem(AProfile.java:880)
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/daitss/core/issues/789#issuecomment-291371359, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AASqrb2JDM2-TrQi3xwxM_V43NbcIlswks5rsaREgaJpZM4Mr7-e.
Harvard no longer support JHOVE, it's now maintained by open preservation, however, it doesn't look like they are responding to outside request. We ran into another JHOVE bug several months ago (https://github.com/daitss/core/issues/785)) and I filed a ticket with them but received no response. So I ended up just fix that JHOVE bug.
This is a new JHOVE bug, I am still working on it.
These PDF are bad PDFs and it chokes JHOVE. The goal now is to fix JHOVE so that it doesn't choke on it.
Is there another validation tool that can inform FSU about the exact problems with the PDFs? In Acrobat Pro I see an “examine document” function that looks like it might do some sort of validation, but I don’t see anything much there after I download the PDF from the stashed WIP. I tried the https://www.pdf-online.com/osa/validate.aspx validator that uses the 3-Heights PDF validator online tool and got the following result, which indicates a PDF1.5 that validated successfully:
[cid:image001.png@01D2B1F0.16EBB860]
So so far I’m not seeing anything that indicates a problem with the PDF itself.
Lydia
The problem lies in JHOVE validating the PDF, so would just need to fix JHOVE on it which I am still working on as it's kind tricky to fix it.
If prefer, the affiliates can validate the PDF with describe.fcla.edu before sending them to us.
I have put a fix in JHOVE and integrate it with the description service. The new fix has been rolled out to ripple. Please test.
Was the description service description.fcla.edu updated as well? I tried submitting one of the problem PDFs to description.fcla.edu and it crashed. The package that contained that file was successfully archived, https://fda.fcla.edu/package/ERY9BF116_U7MW3P, but I'm confused because the describe event for the problem file (file/156) shows it as well-formed and valid, with only the following anomaly:
Carol - could you respond to my questions? I'm afraid that this issue is getting conflated with #781. I need to understand if the FSU packages referenced in this issue have archived correctly or perhaps are corrupt PDFs. It would really be helpful to understand how you modified JHOVE.
description.fcla.edu is hosted on OSS server which is still on Ruby 1.8.7. So we cannot update descrition on that server until OSS is upgraded to Ruby 1.9.3.
The way JHOVE process a PDF is by stages. First it checks for welformness, then it check for validaty. Afterward, it check if the PDF satisfy any profiles like PDF/X and PDF/A (but not to the extent required by the PDF/A specification). The problem was at the third stages which it check for PDF/A profile, several of the metadata in the outline dictionary cannot be read, probably due to foreign character. So yes the PDF is welform and valid, but JHOVE was choking on it and that were the fixes.
Stephen said some packages with this JHOVE issue, has now been able to archived. If so, please close this ticket.
Stephen,
Can you confirm if the new production rollout code fix this issue? If so, please close this ticket.
Carol,
Yes the 3 fsu packages with this issue all archived with the new code rollout. I will now close this ticket.
While trying to ingest a package, I get the following error:
error while processing 51(sip-files/fsu_168965.pdf): bad status http://describe.fda.fcla.edu/describe?location=file:/var/daitss/data/work/EP67W88Y2_ATCHCM/files/original/51/data&uri=info%3Afda%2FEP67W88Y2_ATCHCM%2Ffile%2F51&originalName=sip-files%2Ffsu_168965.pdf: 500 running into exception running into exception StackOverflowError 'unknown exception' while processing 35 bytes of input /opt/web-services/sites/describe/releases/20160607000001/lib/format/formatbase.rb:33:in
method_missing'\n/opt/web-services/sites/describe/releases/20160607000001/lib/format/formatbase.rb:33:in
extract'\n/opt/web-services/sites/describe/releases/20160607000001/lib/formatpool.rb:79:inblock in extractAll'\n/opt/web-services/sites/describe/releases/20160607000001/lib/formatpool.rb:71:in
each'\n/opt/web-services/sites/describe/releases/20160607000001/lib/formatpool.rb:71:inextractAll'\n/opt/web-services/sites/describe/releases/20160607000001/lib/formatpool.rb:34:in
describe'\n/opt/web-services/sites/describe/releases/20160607000001/app.rb:151:inblock in '\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1603:in
call'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1603:inblock in compile!'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:966:in
[]'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:966:inblock (3 levels) in route!'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:985:in
route_eval'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:966:inblock (2 levels) in route!'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1006:in
block in process_route'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1004:incatch'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1004:in
process_route'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:964:inblock in route!'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:963:in
each'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:963:inroute!'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1076:in
block in dispatch!'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1058:inblock in invoke'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1058:in
catch'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1058:ininvoke'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1073:in
dispatch!'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:898:inblock in call!'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1058:in
block in invoke'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1058:incatch'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1058:in
invoke'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:898:incall!'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:886:in
call'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/rack-1.5.2/lib/rack/commonlogger.rb:33:incall'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:217:in
call'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/rack-protection-1.5.3/lib/rack/protection/xss_header.rb:18:incall'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/rack-protection-1.5.3/lib/rack/protection/path_traversal.rb:16:in
call'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/rack-protection-1.5.3/lib/rack/protection/json_csrf.rb:18:incall'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/rack-protection-1.5.3/lib/rack/protection/base.rb:49:in
call'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/rack-protection-1.5.3/lib/rack/protection/base.rb:49:incall'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/rack-protection-1.5.3/lib/rack/protection/frame_options.rb:31:in
call'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/rack-1.5.2/lib/rack/nulllogger.rb:9:incall'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/rack-1.5.2/lib/rack/head.rb:11:in
call'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/rack-1.5.2/lib/rack/methodoverride.rb:21:incall'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:180:in
call'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:2014:incall'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1478:in
block in call'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1788:insynchronize'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1478:in
call'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/thin-1.6.2/lib/thin/connection.rb:86:inblock in pre_process'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/thin-1.6.2/lib/thin/connection.rb:84:in
catch'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/thin-1.6.2/lib/thin/connection.rb:84:inpre_process'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/thin-1.6.2/lib/thin/connection.rb:53:in
process'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/thin-1.6.2/lib/thin/connection.rb:39:inreceive_data'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/eventmachine-1.0.3/lib/eventmachine.rb:187:in
run_machine'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/eventmachine-1.0.3/lib/eventmachine.rb:187:inrun'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/thin-1.6.2/lib/thin/backends/base.rb:73:in
start'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/thin-1.6.2/lib/thin/server.rb:162:instart'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/thin-1.6.2/lib/thin/controllers/controller.rb:87:in
start'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/thin-1.6.2/lib/thin/runner.rb:199:inrun_command'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/thin-1.6.2/lib/thin/runner.rb:155:in
run!'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/thin-1.6.2/bin/thin:6:in'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/bin/thin:23:in
load'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/bin/thin:23:in' while processing sip-files/fsu_168965.pdf /opt/web-services/sites/describe/releases/20160607000001/lib/format/formatbase.rb:55:in
rescue in extract'\n/opt/web-services/sites/describe/releases/20160607000001/lib/format/formatbase.rb:32:inextract'\n/opt/web-services/sites/describe/releases/20160607000001/lib/formatpool.rb:79:in
block in extractAll'\n/opt/web-services/sites/describe/releases/20160607000001/lib/formatpool.rb:71:ineach'\n/opt/web-services/sites/describe/releases/20160607000001/lib/formatpool.rb:71:in
extractAll'\n/opt/web-services/sites/describe/releases/20160607000001/lib/formatpool.rb:34:indescribe'\n/opt/web-services/sites/describe/releases/20160607000001/app.rb:151:in
block in '\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1603:incall'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1603:in
block in compile!'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:966:in[]'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:966:in
block (3 levels) in route!'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:985:inroute_eval'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:966:in
block (2 levels) in route!'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1006:inblock in process_route'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1004:in
catch'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1004:inprocess_route'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:964:in
block in route!'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:963:ineach'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:963:in
route!'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1076:inblock in dispatch!'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1058:in
block in invoke'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1058:incatch'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1058:in
invoke'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1073:indispatch!'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:898:in
block in call!'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1058:inblock in invoke'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1058:in
catch'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1058:ininvoke'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:898:in
call!'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:886:incall'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/rack-1.5.2/lib/rack/commonlogger.rb:33:in
call'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:217:incall'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/rack-protection-1.5.3/lib/rack/protection/xss_header.rb:18:in
call'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/rack-protection-1.5.3/lib/rack/protection/path_traversal.rb:16:incall'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/rack-protection-1.5.3/lib/rack/protection/json_csrf.rb:18:in
call'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/rack-protection-1.5.3/lib/rack/protection/base.rb:49:incall'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/rack-protection-1.5.3/lib/rack/protection/base.rb:49:in
call'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/rack-protection-1.5.3/lib/rack/protection/frame_options.rb:31:incall'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/rack-1.5.2/lib/rack/nulllogger.rb:9:in
call'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/rack-1.5.2/lib/rack/head.rb:11:incall'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/rack-1.5.2/lib/rack/methodoverride.rb:21:in
call'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:180:incall'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:2014:in
call'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1478:inblock in call'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1788:in
synchronize'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1478:incall'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/thin-1.6.2/lib/thin/connection.rb:86:in
block in pre_process'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/thin-1.6.2/lib/thin/connection.rb:84:incatch'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/thin-1.6.2/lib/thin/connection.rb:84:in
pre_process'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/thin-1.6.2/lib/thin/connection.rb:53:inprocess'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/thin-1.6.2/lib/thin/connection.rb:39:in
receive_data'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/eventmachine-1.0.3/lib/eventmachine.rb:187:inrun_machine'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/eventmachine-1.0.3/lib/eventmachine.rb:187:in
run'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/thin-1.6.2/lib/thin/backends/base.rb:73:instart'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/thin-1.6.2/lib/thin/server.rb:162:in
start'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/thin-1.6.2/lib/thin/controllers/controller.rb:87:instart'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/thin-1.6.2/lib/thin/runner.rb:199:in
run_command'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/thin-1.6.2/lib/thin/runner.rb:155:inrun!'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/thin-1.6.2/bin/thin:6:in
'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/bin/thin:23:inload'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/bin/thin:23:in
'I get the same error while trying it on ripple.