daitss / core

DAITSS: Dark Archive In The Sunshine State
GNU General Public License v3.0
9 stars 2 forks source link

PDF error while processing 51(sip-files/fsu_168965.pdf): bad status #789

Closed szanati closed 7 years ago

szanati commented 7 years ago

While trying to ingest a package, I get the following error:

error while processing 51(sip-files/fsu_168965.pdf): bad status http://describe.fda.fcla.edu/describe?location=file:/var/daitss/data/work/EP67W88Y2_ATCHCM/files/original/51/data&uri=info%3Afda%2FEP67W88Y2_ATCHCM%2Ffile%2F51&originalName=sip-files%2Ffsu_168965.pdf: 500 running into exception running into exception StackOverflowError 'unknown exception' while processing 35 bytes of input /opt/web-services/sites/describe/releases/20160607000001/lib/format/formatbase.rb:33:in method_missing'\n/opt/web-services/sites/describe/releases/20160607000001/lib/format/formatbase.rb:33:inextract'\n/opt/web-services/sites/describe/releases/20160607000001/lib/formatpool.rb:79:in block in extractAll'\n/opt/web-services/sites/describe/releases/20160607000001/lib/formatpool.rb:71:ineach'\n/opt/web-services/sites/describe/releases/20160607000001/lib/formatpool.rb:71:in extractAll'\n/opt/web-services/sites/describe/releases/20160607000001/lib/formatpool.rb:34:indescribe'\n/opt/web-services/sites/describe/releases/20160607000001/app.rb:151:in block in '\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1603:incall'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1603:in block in compile!'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:966:in[]'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:966:in block (3 levels) in route!'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:985:inroute_eval'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:966:in block (2 levels) in route!'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1006:inblock in process_route'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1004:in catch'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1004:inprocess_route'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:964:in block in route!'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:963:ineach'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:963:in route!'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1076:inblock in dispatch!'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1058:in block in invoke'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1058:incatch'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1058:in invoke'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1073:indispatch!'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:898:in block in call!'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1058:inblock in invoke'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1058:in catch'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1058:ininvoke'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:898:in call!'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:886:incall'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/rack-1.5.2/lib/rack/commonlogger.rb:33:in call'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:217:incall'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/rack-protection-1.5.3/lib/rack/protection/xss_header.rb:18:in call'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/rack-protection-1.5.3/lib/rack/protection/path_traversal.rb:16:incall'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/rack-protection-1.5.3/lib/rack/protection/json_csrf.rb:18:in call'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/rack-protection-1.5.3/lib/rack/protection/base.rb:49:incall'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/rack-protection-1.5.3/lib/rack/protection/base.rb:49:in call'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/rack-protection-1.5.3/lib/rack/protection/frame_options.rb:31:incall'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/rack-1.5.2/lib/rack/nulllogger.rb:9:in call'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/rack-1.5.2/lib/rack/head.rb:11:incall'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/rack-1.5.2/lib/rack/methodoverride.rb:21:in call'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:180:incall'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:2014:in call'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1478:inblock in call'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1788:in synchronize'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1478:incall'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/thin-1.6.2/lib/thin/connection.rb:86:in block in pre_process'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/thin-1.6.2/lib/thin/connection.rb:84:incatch'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/thin-1.6.2/lib/thin/connection.rb:84:in pre_process'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/thin-1.6.2/lib/thin/connection.rb:53:inprocess'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/thin-1.6.2/lib/thin/connection.rb:39:in receive_data'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/eventmachine-1.0.3/lib/eventmachine.rb:187:inrun_machine'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/eventmachine-1.0.3/lib/eventmachine.rb:187:in run'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/thin-1.6.2/lib/thin/backends/base.rb:73:instart'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/thin-1.6.2/lib/thin/server.rb:162:in start'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/thin-1.6.2/lib/thin/controllers/controller.rb:87:instart'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/thin-1.6.2/lib/thin/runner.rb:199:in run_command'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/thin-1.6.2/lib/thin/runner.rb:155:inrun!'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/thin-1.6.2/bin/thin:6:in '\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/bin/thin:23:inload'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/bin/thin:23:in ' while processing sip-files/fsu_168965.pdf /opt/web-services/sites/describe/releases/20160607000001/lib/format/formatbase.rb:55:inrescue in extract'\n/opt/web-services/sites/describe/releases/20160607000001/lib/format/formatbase.rb:32:in extract'\n/opt/web-services/sites/describe/releases/20160607000001/lib/formatpool.rb:79:inblock in extractAll'\n/opt/web-services/sites/describe/releases/20160607000001/lib/formatpool.rb:71:in each'\n/opt/web-services/sites/describe/releases/20160607000001/lib/formatpool.rb:71:inextractAll'\n/opt/web-services/sites/describe/releases/20160607000001/lib/formatpool.rb:34:in describe'\n/opt/web-services/sites/describe/releases/20160607000001/app.rb:151:inblock in '\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1603:in call'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1603:inblock in compile!'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:966:in []'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:966:inblock (3 levels) in route!'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:985:in route_eval'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:966:inblock (2 levels) in route!'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1006:in block in process_route'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1004:incatch'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1004:in process_route'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:964:inblock in route!'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:963:in each'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:963:inroute!'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1076:in block in dispatch!'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1058:inblock in invoke'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1058:in catch'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1058:ininvoke'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1073:in dispatch!'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:898:inblock in call!'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1058:in block in invoke'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1058:incatch'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1058:in invoke'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:898:incall!'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:886:in call'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/rack-1.5.2/lib/rack/commonlogger.rb:33:incall'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:217:in call'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/rack-protection-1.5.3/lib/rack/protection/xss_header.rb:18:incall'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/rack-protection-1.5.3/lib/rack/protection/path_traversal.rb:16:in call'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/rack-protection-1.5.3/lib/rack/protection/json_csrf.rb:18:incall'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/rack-protection-1.5.3/lib/rack/protection/base.rb:49:in call'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/rack-protection-1.5.3/lib/rack/protection/base.rb:49:incall'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/rack-protection-1.5.3/lib/rack/protection/frame_options.rb:31:in call'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/rack-1.5.2/lib/rack/nulllogger.rb:9:incall'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/rack-1.5.2/lib/rack/head.rb:11:in call'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/rack-1.5.2/lib/rack/methodoverride.rb:21:incall'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:180:in call'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:2014:incall'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1478:in block in call'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1788:insynchronize'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1478:in call'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/thin-1.6.2/lib/thin/connection.rb:86:inblock in pre_process'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/thin-1.6.2/lib/thin/connection.rb:84:in catch'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/thin-1.6.2/lib/thin/connection.rb:84:inpre_process'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/thin-1.6.2/lib/thin/connection.rb:53:in process'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/thin-1.6.2/lib/thin/connection.rb:39:inreceive_data'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/eventmachine-1.0.3/lib/eventmachine.rb:187:in run_machine'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/eventmachine-1.0.3/lib/eventmachine.rb:187:inrun'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/thin-1.6.2/lib/thin/backends/base.rb:73:in start'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/thin-1.6.2/lib/thin/server.rb:162:instart'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/thin-1.6.2/lib/thin/controllers/controller.rb:87:in start'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/thin-1.6.2/lib/thin/runner.rb:199:inrun_command'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/thin-1.6.2/lib/thin/runner.rb:155:in run!'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/gems/thin-1.6.2/bin/thin:6:in'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/bin/thin:23:in load'\n/opt/web-services/sites/describe/shared/bundle/ruby/1.9.1/bin/thin:23:in '

I get the same error while trying it on ripple.

szanati commented 7 years ago

When I submit the above pdf, fsu_168965.pdf, to the description services I get the following error:

running into exception running into exception StackOverflowError 'unknown exception' while processing 24 bytes of input /var/www/description/describe/lib/format/formatbase.rb:34:in method_missing'\n/var/www/description/describe/lib/format/formatbase.rb:34:inextract'\n/var/www/description/describe/lib/formatpool.rb:79:in send'\n/var/www/description/describe/lib/formatpool.rb:79:inextractAll'\n/var/www/description/describe/lib/formatpool.rb:71:in each'\n/var/www/description/describe/lib/formatpool.rb:71:inextractAll'\n/var/www/description/describe/lib/formatpool.rb:34:in describe'\n./app.rb:205:inPOST /describe'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:1540:in call'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:1540:incompile!'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:950:in []'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:950:inroute!'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:966:in route_eval'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:950:inroute!'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:987:in process_route'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:985:incatch'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:985:in process_route'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:948:inroute!'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:947:in each'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:947:inroute!'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:1059:in dispatch!'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:1041:ininvoke'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:1041:in catch'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:1041:ininvoke'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:1056:in dispatch!'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:882:incall!'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:1041:in invoke'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:1041:incatch'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:1041:in invoke'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:882:incall!'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:870:in call'\n/var/www/description/describe/bundle/ruby/1.8/gems/rack-1.5.2/lib/rack/commonlogger.rb:33:incall_without_check'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:212:in call'\n/var/www/description/describe/bundle/ruby/1.8/gems/rack-protection-1.5.0/lib/rack/protection/xss_header.rb:18:incall'\n/var/www/description/describe/bundle/ruby/1.8/gems/rack-protection-1.5.0/lib/rack/protection/path_traversal.rb:16:in call'\n/var/www/description/describe/bundle/ruby/1.8/gems/rack-protection-1.5.0/lib/rack/protection/json_csrf.rb:18:incall'\n/var/www/description/describe/bundle/ruby/1.8/gems/rack-protection-1.5.0/lib/rack/protection/base.rb:49:in call'\n/var/www/description/describe/bundle/ruby/1.8/gems/rack-protection-1.5.0/lib/rack/protection/frame_options.rb:31:incall'\n/var/www/description/describe/bundle/ruby/1.8/gems/rack-1.5.2/lib/rack/nulllogger.rb:9:in call'\n/var/www/description/describe/bundle/ruby/1.8/gems/rack-1.5.2/lib/rack/head.rb:11:incall'\n/var/www/description/describe/bundle/ruby/1.8/gems/rack-1.5.2/lib/rack/methodoverride.rb:21:in call'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:175:incall'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:1949:in call'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:1449:incall'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:1726:in synchronize'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:1449:incall'\n/var/www/description/describe/bundle/ruby/1.8/gems/thin-1.5.1/lib/thin/connection.rb:81:in pre_process'\n/var/www/description/describe/bundle/ruby/1.8/gems/thin-1.5.1/lib/thin/connection.rb:79:incatch'\n/var/www/description/describe/bundle/ruby/1.8/gems/thin-1.5.1/lib/thin/connection.rb:79:in pre_process'\n/var/www/description/describe/bundle/ruby/1.8/gems/thin-1.5.1/lib/thin/connection.rb:54:inprocess'\n/var/www/description/describe/bundle/ruby/1.8/gems/thin-1.5.1/lib/thin/connection.rb:39:in receive_data'\n/var/www/description/describe/bundle/ruby/1.8/gems/eventmachine-1.0.3/lib/eventmachine.rb:187:inrun_machine'\n/var/www/description/describe/bundle/ruby/1.8/gems/eventmachine-1.0.3/lib/eventmachine.rb:187:in run'\n/var/www/description/describe/bundle/ruby/1.8/gems/thin-1.5.1/lib/thin/backends/base.rb:63:instart'\n/var/www/description/describe/bundle/ruby/1.8/gems/thin-1.5.1/lib/thin/server.rb:159:in start'\n/var/www/description/describe/bundle/ruby/1.8/gems/thin-1.5.1/lib/thin/controllers/controller.rb:86:instart'\n/var/www/description/describe/bundle/ruby/1.8/gems/thin-1.5.1/lib/thin/runner.rb:187:in send'\n/var/www/description/describe/bundle/ruby/1.8/gems/thin-1.5.1/lib/thin/runner.rb:187:inrun_command'\n/var/www/description/describe/bundle/ruby/1.8/gems/thin-1.5.1/lib/thin/runner.rb:152:in run!'\n/var/www/description/describe/bundle/ruby/1.8/gems/thin-1.5.1/bin/thin:6\n/local/mongrel/.gem/ruby/1.8/gems/bin/thin:19:inload'\n/local/mongrel/.gem/ruby/1.8/gems/bin/thin:19 while processing fsu_168965.pdf /var/www/description/describe/lib/format/formatbase.rb:56:in extract'\n/var/www/description/describe/lib/formatpool.rb:79:insend'\n/var/www/description/describe/lib/formatpool.rb:79:in extractAll'\n/var/www/description/describe/lib/formatpool.rb:71:ineach'\n/var/www/description/describe/lib/formatpool.rb:71:in extractAll'\n/var/www/description/describe/lib/formatpool.rb:34:indescribe'\n./app.rb:205:in POST /describe'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:1540:incall'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:1540:in compile!'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:950:in[]'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:950:in route!'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:966:inroute_eval'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:950:in route!'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:987:inprocess_route'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:985:in catch'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:985:inprocess_route'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:948:in route!'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:947:ineach'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:947:in route!'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:1059:indispatch!'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:1041:in invoke'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:1041:incatch'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:1041:in invoke'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:1056:indispatch!'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:882:in call!'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:1041:ininvoke'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:1041:in catch'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:1041:ininvoke'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:882:in call!'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:870:incall'\n/var/www/description/describe/bundle/ruby/1.8/gems/rack-1.5.2/lib/rack/commonlogger.rb:33:in call_without_check'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:212:incall'\n/var/www/description/describe/bundle/ruby/1.8/gems/rack-protection-1.5.0/lib/rack/protection/xss_header.rb:18:in call'\n/var/www/description/describe/bundle/ruby/1.8/gems/rack-protection-1.5.0/lib/rack/protection/path_traversal.rb:16:incall'\n/var/www/description/describe/bundle/ruby/1.8/gems/rack-protection-1.5.0/lib/rack/protection/json_csrf.rb:18:in call'\n/var/www/description/describe/bundle/ruby/1.8/gems/rack-protection-1.5.0/lib/rack/protection/base.rb:49:incall'\n/var/www/description/describe/bundle/ruby/1.8/gems/rack-protection-1.5.0/lib/rack/protection/frame_options.rb:31:in call'\n/var/www/description/describe/bundle/ruby/1.8/gems/rack-1.5.2/lib/rack/nulllogger.rb:9:incall'\n/var/www/description/describe/bundle/ruby/1.8/gems/rack-1.5.2/lib/rack/head.rb:11:in call'\n/var/www/description/describe/bundle/ruby/1.8/gems/rack-1.5.2/lib/rack/methodoverride.rb:21:incall'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:175:in call'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:1949:incall'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:1449:in call'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:1726:insynchronize'\n/var/www/description/describe/bundle/ruby/1.8/gems/sinatra-1.4.3/lib/sinatra/base.rb:1449:in call'\n/var/www/description/describe/bundle/ruby/1.8/gems/thin-1.5.1/lib/thin/connection.rb:81:inpre_process'\n/var/www/description/describe/bundle/ruby/1.8/gems/thin-1.5.1/lib/thin/connection.rb:79:in catch'\n/var/www/description/describe/bundle/ruby/1.8/gems/thin-1.5.1/lib/thin/connection.rb:79:inpre_process'\n/var/www/description/describe/bundle/ruby/1.8/gems/thin-1.5.1/lib/thin/connection.rb:54:in process'\n/var/www/description/describe/bundle/ruby/1.8/gems/thin-1.5.1/lib/thin/connection.rb:39:inreceive_data'\n/var/www/description/describe/bundle/ruby/1.8/gems/eventmachine-1.0.3/lib/eventmachine.rb:187:in run_machine'\n/var/www/description/describe/bundle/ruby/1.8/gems/eventmachine-1.0.3/lib/eventmachine.rb:187:inrun'\n/var/www/description/describe/bundle/ruby/1.8/gems/thin-1.5.1/lib/thin/backends/base.rb:63:in start'\n/var/www/description/describe/bundle/ruby/1.8/gems/thin-1.5.1/lib/thin/server.rb:159:instart'\n/var/www/description/describe/bundle/ruby/1.8/gems/thin-1.5.1/lib/thin/controllers/controller.rb:86:in start'\n/var/www/description/describe/bundle/ruby/1.8/gems/thin-1.5.1/lib/thin/runner.rb:187:insend'\n/var/www/description/describe/bundle/ruby/1.8/gems/thin-1.5.1/lib/thin/runner.rb:187:in run_command'\n/var/www/description/describe/bundle/ruby/1.8/gems/thin-1.5.1/lib/thin/runner.rb:152:inrun!'\n/var/www/description/describe/bundle/ruby/1.8/gems/thin-1.5.1/bin/thin:6\n/local/mongrel/.gem/ruby/1.8/gems/bin/thin:19:in `load'\n/local/mongrel/.gem/ruby/1.8/gems/bin/thin:19.

cchou commented 7 years ago

This PDF crash JHOVE, the format validation tool (developed by Harvard) DAITSS use to validate and extract metadata from PDF. Here is the JHOVE output,

MAC-CChou:jhove cchou$ ./jhove -c conf/jhove.conf -m PDF-hul ~/Downloads/fsu_168965.pdf Exception in thread "main" java.lang.StackOverflowError at java.lang.StringCoding$StringEncoder.encode(StringCoding.java:304) at java.lang.StringCoding.encode(StringCoding.java:344) at java.lang.StringCoding.encode(StringCoding.java:387) at java.lang.String.getBytes(String.java:958) at edu.harvard.hul.ois.jhove.module.pdf.Name.isPdfACompliant(Name.java:22) at edu.harvard.hul.ois.jhove.module.pdf.Tokenizer.getNext(Tokenizer.java:472) at edu.harvard.hul.ois.jhove.module.pdf.Parser.getNext(Parser.java:93) at edu.harvard.hul.ois.jhove.module.pdf.Parser.getNext(Parser.java:82) at edu.harvard.hul.ois.jhove.module.pdf.Parser.readObject(Parser.java:278) at edu.harvard.hul.ois.jhove.module.pdf.Parser.readArray(Parser.java:310) at edu.harvard.hul.ois.jhove.module.pdf.Parser.readObject(Parser.java:280) at edu.harvard.hul.ois.jhove.module.pdf.Parser.readDictionary(Parser.java:346) at edu.harvard.hul.ois.jhove.module.pdf.Parser.readObject(Parser.java:283) at edu.harvard.hul.ois.jhove.module.pdf.Parser.readObjectDef(Parser.java:234) at edu.harvard.hul.ois.jhove.module.pdf.Parser.readObjectDef(Parser.java:210) at edu.harvard.hul.ois.jhove.module.PdfModule.getObject(PdfModule.java:2454) at edu.harvard.hul.ois.jhove.module.PdfModule.resolveIndirectObject(PdfModule.java:2383) at edu.harvard.hul.ois.jhove.module.pdf.AProfile.checkOutlineItem(AProfile.java:878) at edu.harvard.hul.ois.jhove.module.pdf.AProfile.checkOutlineItem(AProfile.java:880) at edu.harvard.hul.ois.jhove.module.pdf.AProfile.checkOutlineItem(AProfile.java:880) at edu.harvard.hul.ois.jhove.module.pdf.AProfile.checkOutlineItem(AProfile.java:880) at edu.harvard.hul.ois.jhove.module.pdf.AProfile.checkOutlineItem(AProfile.java:880) at edu.harvard.hul.ois.jhove.module.pdf.AProfile.checkOutlineItem(AProfile.java:880) at edu.harvard.hul.ois.jhove.module.pdf.AProfile.checkOutlineItem(AProfile.java:880) at edu.harvard.hul.ois.jhove.module.pdf.AProfile.checkOutlineItem(AProfile.java:880) at edu.harvard.hul.ois.jhove.module.pdf.AProfile.checkOutlineItem(AProfile.java:880) at edu.harvard.hul.ois.jhove.module.pdf.AProfile.checkOutlineItem(AProfile.java:880) at edu.harvard.hul.ois.jhove.module.pdf.AProfile.checkOutlineItem(AProfile.java:880) at edu.harvard.hul.ois.jhove.module.pdf.AProfile.checkOutlineItem(AProfile.java:880) at edu.harvard.hul.ois.jhove.module.pdf.AProfile.checkOutlineItem(AProfile.java:880) at edu.harvard.hul.ois.jhove.module.pdf.AProfile.checkOutlineItem(AProfile.java:880)

szanati commented 7 years ago

There were 2 more packages that had 1pdf per package with the same error as above. They are fsu_175829.pdf and fsu_182240.pdf. FSU has more packages to come that will probably have pdfs with the same error. Can you look more into this.

lydiam commented 7 years ago

Carol,

What’s the next step? Notify Harvard? Do we know if the problem is in the PDF itself or in JHOVE? (I would imagine that no file should crash JHOVE). We need to let FSU know what to expect.

Thanks,

Lydia

From: Carol Chou [mailto:notifications@github.com] Sent: Monday, April 03, 2017 10:01 PM To: daitss/core Cc: Subscribed Subject: Re: [daitss/core] PDF error while processing 51(sip-files/fsu_168965.pdf): bad status (#789)

This PDF crash JHOVE, the format validation tool (developed by Harvard) DAITSS use to validate and extract metadata from PDF. Here is the JHOVE output,

MAC-CChou:jhove cchou$ ./jhove -c conf/jhove.conf -m PDF-hul ~/Downloads/fsu_168965.pdf Exception in thread "main" java.lang.StackOverflowError at java.lang.StringCoding$StringEncoder.encode(StringCoding.java:304) at java.lang.StringCoding.encode(StringCoding.java:344) at java.lang.StringCoding.encode(StringCoding.java:387) at java.lang.String.getBytes(String.java:958) at edu.harvard.hul.ois.jhove.module.pdf.Name.isPdfACompliant(Name.java:22) at edu.harvard.hul.ois.jhove.module.pdf.Tokenizer.getNext(Tokenizer.java:472) at edu.harvard.hul.ois.jhove.module.pdf.Parser.getNext(Parser.java:93) at edu.harvard.hul.ois.jhove.module.pdf.Parser.getNext(Parser.java:82) at edu.harvard.hul.ois.jhove.module.pdf.Parser.readObject(Parser.java:278) at edu.harvard.hul.ois.jhove.module.pdf.Parser.readArray(Parser.java:310) at edu.harvard.hul.ois.jhove.module.pdf.Parser.readObject(Parser.java:280) at edu.harvard.hul.ois.jhove.module.pdf.Parser.readDictionary(Parser.java:346) at edu.harvard.hul.ois.jhove.module.pdf.Parser.readObject(Parser.java:283) at edu.harvard.hul.ois.jhove.module.pdf.Parser.readObjectDef(Parser.java:234) at edu.harvard.hul.ois.jhove.module.pdf.Parser.readObjectDef(Parser.java:210) at edu.harvard.hul.ois.jhove.module.PdfModule.getObject(PdfModule.java:2454) at edu.harvard.hul.ois.jhove.module.PdfModule.resolveIndirectObject(PdfModule.java:2383) at edu.harvard.hul.ois.jhove.module.pdf.AProfile.checkOutlineItem(AProfile.java:878) at edu.harvard.hul.ois.jhove.module.pdf.AProfile.checkOutlineItem(AProfile.java:880) at edu.harvard.hul.ois.jhove.module.pdf.AProfile.checkOutlineItem(AProfile.java:880) at edu.harvard.hul.ois.jhove.module.pdf.AProfile.checkOutlineItem(AProfile.java:880) at edu.harvard.hul.ois.jhove.module.pdf.AProfile.checkOutlineItem(AProfile.java:880) at edu.harvard.hul.ois.jhove.module.pdf.AProfile.checkOutlineItem(AProfile.java:880) at edu.harvard.hul.ois.jhove.module.pdf.AProfile.checkOutlineItem(AProfile.java:880) at edu.harvard.hul.ois.jhove.module.pdf.AProfile.checkOutlineItem(AProfile.java:880) at edu.harvard.hul.ois.jhove.module.pdf.AProfile.checkOutlineItem(AProfile.java:880) at edu.harvard.hul.ois.jhove.module.pdf.AProfile.checkOutlineItem(AProfile.java:880) at edu.harvard.hul.ois.jhove.module.pdf.AProfile.checkOutlineItem(AProfile.java:880) at edu.harvard.hul.ois.jhove.module.pdf.AProfile.checkOutlineItem(AProfile.java:880) at edu.harvard.hul.ois.jhove.module.pdf.AProfile.checkOutlineItem(AProfile.java:880) at edu.harvard.hul.ois.jhove.module.pdf.AProfile.checkOutlineItem(AProfile.java:880)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/daitss/core/issues/789#issuecomment-291371359, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AASqrb2JDM2-TrQi3xwxM_V43NbcIlswks5rsaREgaJpZM4Mr7-e.

cchou commented 7 years ago

Harvard no longer support JHOVE, it's now maintained by open preservation, however, it doesn't look like they are responding to outside request. We ran into another JHOVE bug several months ago (https://github.com/daitss/core/issues/785)) and I filed a ticket with them but received no response. So I ended up just fix that JHOVE bug.

This is a new JHOVE bug, I am still working on it.

cchou commented 7 years ago

These PDF are bad PDFs and it chokes JHOVE. The goal now is to fix JHOVE so that it doesn't choke on it.

lydiam commented 7 years ago

Is there another validation tool that can inform FSU about the exact problems with the PDFs? In Acrobat Pro I see an “examine document” function that looks like it might do some sort of validation, but I don’t see anything much there after I download the PDF from the stashed WIP. I tried the https://www.pdf-online.com/osa/validate.aspx validator that uses the 3-Heights PDF validator online tool and got the following result, which indicates a PDF1.5 that validated successfully:

[cid:image001.png@01D2B1F0.16EBB860]

So so far I’m not seeing anything that indicates a problem with the PDF itself.

Lydia

cchou commented 7 years ago

The problem lies in JHOVE validating the PDF, so would just need to fix JHOVE on it which I am still working on as it's kind tricky to fix it.

If prefer, the affiliates can validate the PDF with describe.fcla.edu before sending them to us.

cchou commented 7 years ago

I have put a fix in JHOVE and integrate it with the description service. The new fix has been rolled out to ripple. Please test.

lydiam commented 7 years ago

Was the description service description.fcla.edu updated as well? I tried submitting one of the problem PDFs to description.fcla.edu and it crashed. The package that contained that file was successfully archived, https://fda.fcla.edu/package/ERY9BF116_U7MW3P, but I'm confused because the describe event for the problem file (file/156) shows it as well-formed and valid, with only the following anomaly: Outlines contain recursive references., and it seemed that our assumption was that the file had anomalies that caused JHOVE to crash. Was this the expected result from your update? What exactly did you change in the JHOVE fix?

lydiam commented 7 years ago

Carol - could you respond to my questions? I'm afraid that this issue is getting conflated with #781. I need to understand if the FSU packages referenced in this issue have archived correctly or perhaps are corrupt PDFs. It would really be helpful to understand how you modified JHOVE.

cchou commented 7 years ago

description.fcla.edu is hosted on OSS server which is still on Ruby 1.8.7. So we cannot update descrition on that server until OSS is upgraded to Ruby 1.9.3.

The way JHOVE process a PDF is by stages. First it checks for welformness, then it check for validaty. Afterward, it check if the PDF satisfy any profiles like PDF/X and PDF/A (but not to the extent required by the PDF/A specification). The problem was at the third stages which it check for PDF/A profile, several of the metadata in the outline dictionary cannot be read, probably due to foreign character. So yes the PDF is welform and valid, but JHOVE was choking on it and that were the fixes.

781 is a totally separate issue, that one fails at PDFaPIlot. Besides, they are in different problem packages.

Stephen said some packages with this JHOVE issue, has now been able to archived. If so, please close this ticket.

cchou commented 7 years ago

Stephen,

Can you confirm if the new production rollout code fix this issue? If so, please close this ticket.

szanati commented 7 years ago

Carol,

Yes the 3 fsu packages with this issue all archived with the new code rollout. I will now close this ticket.