bodleian / ora_data_model

Documentation and crosswalks relating to the ORA data model
1 stars 1 forks source link

Certain potentially corrupted files cannot be deposited #160

Closed tomwrobel closed 4 years ago

tomwrobel commented 4 years ago

When the file attached to this trello ticket is deposited in Oxris, it raises an error in Sword as follows:

NoMethodError (undefined method `bytesize' for nil:NilClass):
F, [2019-12-20T12:17:37.149903 #32208] FATAL -- :   
F, [2019-12-20T12:17:37.149975 #32208] FATAL -- : vendor-cache-mirror/vendor/cache/ruby/2.5.0/gems/rubyzip-2.0.0/lib/zip/entry.rb:356:in `check_c_dir_entry_static_header_length'
vendor-cache-mirror/vendor/cache/ruby/2.5.0/gems/rubyzip-2.0.0/lib/zip/entry.rb:380:in `read_c_dir_entry'
vendor-cache-mirror/vendor/cache/ruby/2.5.0/gems/rubyzip-2.0.0/lib/zip/entry.rb:206:in `read_c_dir_entry'
vendor-cache-mirror/vendor/cache/ruby/2.5.0/gems/rubyzip-2.0.0/lib/zip/central_directory.rb:127:in `block in read_central_directory_entries'
vendor-cache-mirror/vendor/cache/ruby/2.5.0/gems/rubyzip-2.0.0/lib/zip/central_directory.rb:126:in `times'
vendor-cache-mirror/vendor/cache/ruby/2.5.0/gems/rubyzip-2.0.0/lib/zip/central_directory.rb:126:in `read_central_directory_entries'
vendor-cache-mirror/vendor/cache/ruby/2.5.0/gems/rubyzip-2.0.0/lib/zip/central_directory.rb:138:in `read_from_stream'
vendor-cache-mirror/vendor/cache/ruby/2.5.0/gems/rubyzip-2.0.0/lib/zip/file.rb:82:in `block in initialize'
vendor-cache-mirror/vendor/cache/ruby/2.5.0/gems/rubyzip-2.0.0/lib/zip/file.rb:81:in `open'
vendor-cache-mirror/vendor/cache/ruby/2.5.0/gems/rubyzip-2.0.0/lib/zip/file.rb:81:in `initialize'
vendor-cache-mirror/vendor/cache/ruby/2.5.0/gems/rubyzip-2.0.0/lib/zip/file.rb:111:in `new'
vendor-cache-mirror/vendor/cache/ruby/2.5.0/gems/rubyzip-2.0.0/lib/zip/file.rb:111:in `open'
vendor-cache-mirror/vendor/cache/ruby/2.5.0/bundler/gems/willow_sword-76fa29068f9a/lib/willow_sword/zip_package.rb:16:in `unzip_file'
vendor-cache-mirror/vendor/cache/ruby/2.5.0/bundler/gems/willow_sword-76fa29068f9a/app/controllers/concerns/willow_sword/save_data.rb:99:in `organize_data'
vendor-cache-mirror/vendor/cache/ruby/2.5.0/bundler/gems/willow_sword-76fa29068f9a/app/controllers/concerns/willow_sword/save_data.rb:16:in `save_binary_data'
vendor-cache-mirror/vendor/cache/ruby/2.5.0/bundler/gems/willow_sword-76fa29068f9a/app/controllers/concerns/willow_sword/process_request.rb:25:in `validate_and_save_request'
vendor-cache-mirror/vendor/cache/ruby/2.5.0/bundler/gems/willow_sword-76fa29068f9a/app/controllers/willow_sword/file_sets_controller.rb:57:in `perform_create'
vendor-cache-mirror/vendor/cache/ruby/2.5.0/bundler/gems/willow_sword-76fa29068f9a/app/controllers/willow_sword/file_sets_controller.rb:26:in `create

The file presents as an XML file but may be a zip file or just a problematic file. It is not a plain text XML file and does not open.

Looking at the /tmp directory in ORA, the file is being received, but no bag is being created.

For a 'healthy' binary file we get the following:

-- tmp/{hash}/
-- tmp/{hash}/{file}
-- tmp/{hash}/contents
-- tmp/{hash}/contents/{file}
-- tmp/{hash}/bag
-- tmp/{hash}/bag/bag-info.txt
-- tmp/{hash}/bag/bagit.txt
-- tmp/{hash}/bag/manifest-md5.txt
-- tmp/{hash}/bag/manifest-sha1.txt
-- tmp/{hash}/bag/tagmanifest-md5.txt
-- tmp/{hash}/bag/tagmanifest-sha1.txt
-- tmp/{hash}/bag/data
-- tmp/{hash}/bag/data/{file}

For the broken file, we're getting just

-- tmp/{hash}/
-- tmp/{hash}/{file}
-- tmp/{hash}/contents
-- tmp/{hash}/contents/{file}

I suspect that the issue is that the code tries to unzip the binary file, and it's failing. This is related to https://github.com/tomwrobel/ora_data_model/issues/161.

Marked as post-release as not urgent

Linked Trello ticket: https://trello.com/c/zjQIwALz/781-broken-file-in-oxris-deposit

tomwrobel commented 4 years ago

Closing along with https://github.com/tomwrobel/ora_data_model/issues/161

The issue occurred because the corrupted file is reported by file characterisation as a zip file (it isn't, but RedHat/Centos thinks it is) and sword tried to extract from it (see #161). By disabling the zip extraction behaviour, we disabled this behaviour too.