digital-preservation / droid

DROID (Digital Record and Object Identification)
BSD 3-Clause "New" or "Revised" License
267 stars 74 forks source link

ZIP Container error #232

Open thorsted opened 4 years ago

thorsted commented 4 years ago

Attempting to make a signature for a file format in a ZIP container. Created signature and tested in DROID and file is only identified as ZIP signature. Checked the log and can see an error: "WARN Could not process the potential container format (ZIP): file:/Users/thorsted/Documents/file.olm ZIP file spanning/splitting is not supported!"

Checked multiple samples for file format and getting the same error. Is this a bug?

anjackson commented 4 years ago

I found this potentially related issue: https://github.com/digital-preservation/droid/issues/71

Are you running the latest version of DROID?

Dclipsham commented 4 years ago

Hi Tyler,

Are you able to share an example file that encounters this issue, either here or privately?

David

thorsted commented 4 years ago

Running version 6.4. It does seem connected to issue #71. Samples sent to David.

Dclipsham commented 4 years ago

Thanks Tyler, easy to reproduce here, and confirming as bug. My instinct is that it is similar to #71 and probably has a similar resolution (updating/changing zip handler libraries) - I've revisited the problematic files from #100 and they work with DROID 6.4 but should also be tested as part of any fix for this issue.

Dclipsham commented 4 years ago

To note also, These zips unpack happily in 7zip and Windows Explorer, however when browsing the contents before unpacking, both tools have issues gathering properties of the contents, as per the images below incompleteProperties7Zip

incompleteProperties

thorsted commented 4 years ago

David, I also was able to confirm, by recompressing with zip and naming as OLM, my container signature identifies them correctly.

-Tyler

On Fri, Aug 23, 2019 at 3:38 AM David Clipsham notifications@github.com wrote:

To note also, These zips unpack happily in 7zip and Windows Explorer, however when browsing the contents before unpacking, both tools have issues gathering properties of the contents, as per the images below [image: incompleteProperties7Zip] https://user-images.githubusercontent.com/2189778/63583041-1d5c5c00-c592-11e9-88e9-c3b3d7d527eb.png [image: incompleteProperties] https://user-images.githubusercontent.com/2189778/63583043-1d5c5c00-c592-11e9-85e4-d2c46171df83.png

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/digital-preservation/droid/issues/232?email_source=notifications&email_token=AABUCMEDYJD4IVAVCLQZTNDQF6VZFA5CNFSM4IOPK7AKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD47WHVY#issuecomment-524248023, or mute the thread https://github.com/notifications/unsubscribe-auth/AABUCMDZXOE56BPJ4DL6CF3QF6VZFANCNFSM4IOPK7AA .

Dclipsham commented 4 years ago

There's been a conversation around this issue on Google Groups - https://groups.google.com/forum/#!topic/droid-list/N1tE3ZuEDbo - it seems that the zip handler either needs updating or replacing as the error handling throwing the warning appears to be unhappy with what should be valid Zip64 structure (https://groups.google.com/d/msg/droid-list/N1tE3ZuEDbo/wWDP_FjYAwAJ)

jcharlet commented 4 years ago

hi @thorsted my solution is actually not satisfying, I forced droid to identify olm files as archives and scan their contents, while actually it should just be identified as a container file.

Would you have a signature file to submit for olm files?

thorsted commented 4 years ago

Here is my initial signature attempt. Yes, I agree, these should be identified as containers. OLM-Sig.zip

jcharlet commented 4 years ago

Does it work on your side when you run droid @thorsted ? All my olm samples are identified as zip still. (on droid6.4 and from master branch)

Screenshot from 2019-12-06 14-03-15 olm-sample.zip

thorsted commented 1 year ago

I have another ZIP Container error. Different message. "Could not process the archival format(ZIP): file:///Flash5.5-S01v5.fla Expected 25 more entries in the Central Directory!"

7ZIP shows a header error when testing. FLA-error.zip

ross-spencer commented 1 year ago

@thorsted is that new one not the file itself? (rather than something DROID should compensate for?)

Was interested to have a look. lsar and unar work well in Linux, or seem to. But 7z and zip are as follows:

$ zip -T Flash5.5-S01v5.fla
error [Flash5.5-S01v5.fla]:  missing 54 bytes in zipfile
  (attempting to process anyway)
error [Flash5.5-S01v5.fla]:  reported length of central directory is
  54 bytes too long (Atari STZip zipfile?  J.H.Holm ZIPSPLIT 1.1
  zipfile?).  Compensating...
error: invalid zip file with overlapped components (possible zip bomb)
test of Flash5.5-S01v5.fla FAILED
$ 7z t Flash5.5-S01v5.fla

7-Zip [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,64 bits,2 CPUs Intel(R) Xeon(R) Gold 6134 CPU @ 3.20GHz (50650),ASM,AES-NI)

Scanning the drive for archives:
1 file, 216581 bytes (212 KiB)

Testing archive: Flash5.5-S01v5.fla

ERRORS:
Headers Error

--
Path = Flash5.5-S01v5.fla
Type = zip
ERRORS:
Headers Error
Physical Size = 216581
Embedded Stub Size = 63

Archives with Errors: 1

Open Errors: 1
Dclipsham commented 1 year ago

Just an aside - does this format need a container sig if it's creating non-standard zips? in this case there's an apparent binary ID hook from offset 0x1E - 'mimetypeapplication/vnd.adobe.xfl'. I haven't got a pool of samples myself to check consistency, but just an observation from this one file...

thorsted commented 1 year ago

I should have mentioned, this happens on hundreds of my samples, many directly from the Software Installation CD, from multiple versions. None of which have errors in software when opened.

Dclipsham commented 1 year ago

for ref, specific issue with .fla is also described here: https://sourceforge.net/p/sevenzip/discussion/45798/thread/9e936d87/ with an effective won't-fix from 7z maintainer

thorsted commented 1 year ago

I could try a simple binary identification method, but not all FLA files have the mimetype file within the structure. I am basing my identification of the DOMDocument.xml as it has a xflVersion string, which will allow me to get each version identified correctly. Adobe Flash was retired and Adobe Animate continues to use the FLA format. All the files I test from even the most recent version all have this central directory error. But when I create FLA files with this tool, I don't see the issue.

You can see some Animate samples here.

So how much should Droid do to validate a file, versus do everything it can to identify a file even if it has to ignore some errors along the way? All the FLA files I have looked at will unzip with the right content, but sometimes will have a duplicate filename. Shouldn't Droid attempt to do the same?

Dclipsham commented 1 year ago

Well I think as it is, DROID is just using the zip handling library, TrueZip, to perform its zip-related tasks, so if there's a compatible zip library that handles this elegantly, but doesn't cause regressions elsewhere then I would hope it would be relatively straight forward to update, but of course I'm no longer in a position to directly influence DROID's dev roadmap...

Since TrueZip's latest version is 7.7.10 (https://mvnrepository.com/artifact/de.schlichtherle.truezip/truezip/7.7.10), which is 6 years old and has various vulns, I would hope that a prioritisation case could be made. CC @sparkhi @OliverHannan

Dclipsham commented 1 year ago

TrueVFS is the successor project to TrueZip. see https://mvnrepository.com/artifact/net.java.truevfs/truevfs-driver-zip/0.14.0 and http://truevfs.net/ (although the latter link refers to version 0.12, which is behind the latest maven version). I'm not currently in a position to build something to test whether this would overcome either the OLM or FLA issue, but might be the path of least resistance if it does work...

OliverHannan commented 1 year ago

Sorry I missed this notification. Thanks for the original comment @thorsted and updates @ross-spencer, @Dclipsham. We'll take this into account once we have dedicated developer time. I'll get it into the backlog ASAP though.

sparkhi commented 11 months ago

TrueVFS is the successor project to TrueZip. see https://mvnrepository.com/artifact/net.java.truevfs/truevfs-driver-zip/0.14.0 and http://truevfs.net/ (although the latter link refers to version 0.12, which is behind the latest maven version). I'm not currently in a position to build something to test whether this would overcome either the OLM or FLA issue, but might be the path of least resistance if it does work...

We have updated to use TrueVFS and were hoping it would work but it has not solved the original issue mentioned in this ticket. As a result, I'm going to leave this ticket open.

sparkhi commented 11 months ago

I've done a bit more trials and I am noting a few things I've found here. There is no solution to this yet.

The animate files that error do not appear to be Zip64, so the error is unrelated to zip64. I tried creating a zip64 file locally and it worked fine in droid.

The FLA files that produce the error, when I tried to dig deeper using 7Zip, It gave a cryptic error (but did not stop) Errors: FIXME-MyLoadStringW-

One interesting things noticed when using 7zip to browse the file is, there appears to be a same file appearing twice at the same location image

steve-daly commented 11 months ago

From a brief Google on this point I don't think FLA files are truly valid ZIP files of any variant. They are close, and some applications will have a try at opening them, but it looks as if the creators of the FLA format have deviated from, or extended, the ZIP format to include other data or relax constraints (e.g. the multiple mimetype files, and corrupt Central Directory section). As @sparkhi says, we've updated DROID to use a modern ZIP library but it still won't allow FLA files to be used with container signatures.

Dclipsham commented 11 months ago

Not a disimmilar conclusion to that which the 7zip folk came to: https://sourceforge.net/p/sevenzip/discussion/45798/thread/9e936d87/

Out of interest does OLM at least ID now with the TrueVFS update?

thorsted commented 11 months ago

Not a disimmilar conclusion to that which the 7zip folk came to: https://sourceforge.net/p/sevenzip/discussion/45798/thread/9e936d87/

Out of interest does OLM at least ID now with the TrueVFS update?

No, the OLM format also does not ID.

Could not process the potential container format (ZIP): file:///Volumes/File%20Formats/OLM/OLM-samples/Outlook%20for%20Mac%202011%20Archive2.olm ZIP file spanning/splitting is not supported!

thorsted commented 11 months ago

From a brief Google on this point I don't think FLA files are truly valid ZIP files of any variant. They are close, and some applications will have a try at opening them, but it looks as if the creators of the FLA format have deviated from, or extended, the ZIP format to include other data or relax constraints (e.g. the multiple mimetype files, and corrupt Central Directory section). As @sparkhi says, we've updated DROID to use a modern ZIP library but it still won't allow FLA files to be used with container signatures.

Should DROID only process valid ZIP files? Validity of a file format should come after identification in my opinion. Can the ZIP library be configured to ignore many of these errors and provide some access to the contents?

steve-daly commented 11 months ago

@thorsted do you have a simple OLM file sample you could share (or do you know if we already have one of these from you?)

steve-daly commented 11 months ago

Should DROID only process valid ZIP files? Validity of a file format should come after identification in my opinion. Can the ZIP library be configured to ignore many of these errors and provide some access to the contents?

Sadly these aren't just warnings from the ZIP library which could be inhibited, but these files are missing some essential elements of the ZIP specification which this library (and others) need to extract it fully. Effectively, although the compressed FLA format has some elements in common with the ZIP format specification, it's not actually a ZIP file.

thorsted commented 11 months ago

Effectively, although the compressed FLA format has some elements in common with the ZIP format specification, it's not actually a ZIP file.

7Zip identifies all my FLA and OLM samples as ZIP compressed. Uncompresses them all successfully, but with an error.