Grandt / PHPZip

PHP Class to create archives of compressed files in ZIP format.
http://www.phpclasses.org/package/6110
118 stars 21 forks source link

ZipStream: Zip file invalid when trying to view the Zip file with Windows Explorer #7

Open xdc0 opened 11 years ago

xdc0 commented 11 years ago

I'm creating a zip file doing something like this:

$zip = new ZipStream("test.zip");
$zip->addDirectory("test");
$zip->addDirectoryContent("/home/user/temp/","test");
return $zip->finalize();

A zip file get downloaded and with Linux unzip console, the contents get uncompressed correctly, under Windows with 7-zip the file also is shown and uncompressed just fine, but under Windows Explorer native zip explorer it says that the Zip File is invalid.

I cannot provide the Zip file that gets downloaded as it contains some sensitive data, I'll try to create one that is safer to share.

Grandt commented 11 years ago

I've tested this, and not been able to recreate the error. I'll try on another PC tomorrow.

https://dl.dropbox.com/u/10561661/Issue%20%237%20-%20test.zip

linvald commented 11 years ago

FYI: I was able to open the test.zip natively in Windows Explorer with no errors...

Grandt commented 11 years ago

Does the directory you are compressing contain symlinks, or very large files? Does WinZip or WinRar's test archive functions yield any warnings or errors on the problematic archive?

 I'll run a few more tests tomorrow, see if you can add 'safe' files with no secret information that will trigger the bug.

Cheers A. Grandt

Sent from my Samsung Galaxy Tab 10.1

linvald notifications@github.com wrote:

FYI: I was able to open the test.zip natively in Windows Explorer with no errors...

— Reply to this email directly or view it on GitHub.

xdc0 commented 11 years ago

Thanks for your prompt answer.

  1. The directory being compressed does not contain symlinks nor large files, the whole directory is less than 150kb.
  2. Can't tell if WinZip or WinRar yield any warnings, but Linux unzip does give a warning "3 extra bytes at beginning or within zipfile" I don't have access to a Windows machine right now, the one I had access earlier only had 7-zip and I didn't see any warnings, however when attempting to open with Windows Explorer, it will refuse to open due to an error "Invalid zip file". I will update on this as soon as I can get ahold of a Windows machine.

Here's a sample zip file where this is happening: https://dl.dropbox.com/s/sf77xjm5tjtd44d/test.zip

Let me know if you have trouble downloading the file

As a side note, just in case, I'm using the ZipStream file in master, with no changes at all other than changing the class name to match autoloader rules.

Edit:

Just had access to another Windows machine with WinRar. Used WinRar's test feature, and it said that the file did not contain any errors and happily opened it and was able to uncompress it, then tried to open with Windows Explorer and the same error was shown "test.zip is an invalid zip file"

Grandt commented 11 years ago

Hi

The problem is that the zip file is starting with 3 extra bytes, as the Linux unzip told you. A space and a (windows) newline (0x20 0x0d 0x0a). This can happen if for instance the generating PHP script contains a comment block with a new line before the first PHP start tag. It is not the first time I see something like this, and will probably look into seeing if I can force a buffer clear before starting the Zip file.

Cheers A.Grandt

On 30-11-2012 01:26, Chuy Martinez wrote:

Thanks for your promt answer.

  1. The directory being compressed does not contain symlinks nor large files, the whole directory is less than 150kb.
  2. Can't tell if WinZip or WinRar yield any warnings, but Linux |unzip| does give a warning "3 extra bytes at beginning or within zipfile" I don't have access to a Windows machine right now, the one I had access earlier only had 7-zip and I didn't see any warnings, however when attempting to open with Windows Explorer, it will refuse to open due to an error "Invalid zip file". I will update on this as soon as I can get ahold of a Windows machine.

Here's a sample zip file where this is happening: https://dl.dropbox.com/s/sf77xjm5tjtd44d/test.zip

Let me know if you have trouble downloading the file

— Reply to this email directly or view it on GitHub https://github.com/Grandt/PHPZip/issues/7#issuecomment-10873178.

xdc0 commented 11 years ago

Hmm I couldn't find any of my php files starting with a newline instead the php tag. I'm currently using PHPZip to provide a way to compress the contents of a page created by a CMS that is part of a system, all under ZF1. Hopefully that is relevant to the issue at hand.

I'll take a look on the library and see if I can find a way to do the buffer cleanup and hopefully get rid of this problem. I'll send in the patch if it works.

Thanks for your help

Grandt commented 11 years ago

I recreated the issue, if not the cause in ZipStream.Example2.php added to the PHPZip repository last night.

Had your code added those before calling new ZipStream(...), it would have thrown an error, the usual cause as I see it is if your PHP script have multiple PHP segments in separate <?php ?> segments.

On 01-12-2012 00:51, Chuy Martinez wrote:

Hmm I couldn't find any of my php files starting with a newline instead the php tag. I'm currently using PHPZip to provide a way to compress the contents of a page created by a CMS that is part of a system, all under ZF1. Hopefully that is relevant to the issue at hand.

I'll take a look on the library and see if I can find a way to do the buffer cleanup and hopefully get rid of this problem. I'll send in the patch if it works.

Thanks for your help

— Reply to this email directly or view it on GitHub https://github.com/Grandt/PHPZip/issues/7#issuecomment-10908909.

xdc0 commented 11 years ago

A CMS entry is a ZF model, I created a method in the model to convert that CMS entry as zip file. CMS entries may or may not have a dedicated directory with additional contents, some entries are just a page itself with nothing else, as you can see there are two paths, one that creates the zip directory or one that just adds the html for simple entries. This is how my function looks like:

    public function convertToZip() {

        $name = $this->get('name');
        if(!$name) {
            return false;
        } else {
            $cms_dir = Util_Conf::get('cms_assets_dir');
            $dest_dir = $cms_dir.$name;
            $zip = new Ctrl_Util_Helper_ZipStream("$name.zip");
            if(is_dir($dest_dir)) {
                $zip->addDirectory("$name");
                $zip->addDirectoryContent($dest_dir,"$name");
            } else {
                $html = $this->get('html');
                $zip->openStream("$name.html");
                $zip->addStreamData($html);
                $zip->closeStream();
            }
            return $zip->finalize();
        }
    }

I just noticed something that I oversighted. On Chrome, you can see the request done to retrieve the zip file on the network tab of the developer tools. However, for me it's in red, the status says "canceled" and the size does not seems right, in the next example the size of the zip file should be around the 124kb: zipfile_error

Apparently since both the framework and the library toys around with the HTTP headers and the ZipStream::finalize() methond apparently is handling the sending functionality to the client, I think something is conflicting around there, is that a possible cause? maybe that's nothing to do with it. I'll keep looking the next few days and see if I can come up with something.

Thanks for your follow up.

Grandt commented 11 years ago

To understand the last bit first, you have to realize how ZipStream (ZS) works, compared to the regular Zip class. ZS builds the zip file on the fly, and sends it to the user as it is built, one zip entry at a time, so ZS have no way of knowing how large the end result is going to be when it sends the initial HTTP header.

A rough explanation of the structure of a zip file is that it is a series of zip entries, with an entry header followed by its compressed data, and at the very end of the file is the CentralDirectory (CD) record, which again lists the zip entries and their offsets/start address.

When you build a zip, each file or directory is an entry, and ZipStream will send these entries to the user as they are made. Finalize writes the CD record.

The problem of the stream being cancelled is baffling me, but may be because I send the header in the ZipStream constructor, followed by a flush() command. It shouldn't have been an issue though.

So for the problematic 3 bytes to be where they are, they would have had to have been added between the constructor (as it'll die with an error message if the output buffer had any data before being called) and the first zip entry being added. I'd look at _Ctrl_Util_HelperZipStream if I were you, there maybe something in there causing a space followed by a new line to be added. In the mean time I'll test the code on Chrome to see what it is doing.

Grandt commented 11 years ago

Sorry, I ran the wrong test. The one where I added the bytes in question. I'm caffeine deprived.

xdc0 commented 11 years ago

I found out the problem. It's basically what you said, a new line in the beginning of the file, but it's subtler than a lost php tag. I was working on something else related to JSON responses, and I noticed on the chrome dev tool that the request response actually had a new line on the beginning ! I had no clue why was that (it didn't affect anything on the functionality as it was being parsed correctly, but that empty line puzzled me) so I took a look on other sections of the page and to my surprise every single request contained a new line in the beginning, be it HTML, JSON, XML, even ZipStream.

I don't have a clue why, where and what is causing that new line but based on your comments, I believe that it's the cause of the problem.

With that said, should I close this issue or do you plan on implementing a buffer clear and would like to keep track of this?

Grandt commented 11 years ago

I tried to implement a buffer clear, and it didn't work, the problem is that the only time I actually can do it, is when the constructor is called, after that the buffer may contain data belonging to ZipStream, so I can't risk clearing it before starting a new Zip entry. The weird bit is that the problem isn't in the initial php file calling the ZipStream, as it would have resulted in an error from ZipStream.

I can think of a potential solution, but it would require a variable to keep track, and it would be a hack that may not even solve the problem at hand.

My best guess is a utility function/php class/file, for instance the database initialization, and it may not be the first line, but the last. A PHP file having an extra new line at the end is easy to miss.

I'll keep it open for a few more days to see if I can come up with something. But finding your cause is important as well, as it may affect any binary files generated.

Cheers A.Grandt

louishenri13 commented 10 years ago

I am getting the same problem. Here is the error it is giving me:

Line 1:<br /> Line 2:<b>Fatal error</b>: Class 'ZipStream' not found in <b>C:\xampp\htdocs\zip\test\index.php</b> on line <b>52</b><br /> Line 3:

I am unable to open it using windows explorer. I can only open it using winrar. How come?

Grandt commented 10 years ago

Sorry I haven't gotten back to you, I forgot, my bad.

That you get an error, and then can open the file in WinRar is because WinRar is actually a pretty awesome piece of software, as it can, to a point, figure out a Zip file, even if it contains errors.

I need to know more about how your code is structured., as I obviously don't have a copy of your intex.php.

That it can't fins the ZipStream class is, I guess, that the include is pointing to the wrong directory.

Cheers A.Grandt

On 09-07-2013 16:40, louishenri13 wrote:

I am getting the same problem. Here is the error it is giving me:

Line 1:

Line 2:Fatal error: Class 'ZipStream' not found in C:\xampp\htdocs\zip\test\index.php on line 52

Line 3:

I am unable to open it using windows explorer. I can only open it using winrar. How come?

— Reply to this email directly or view it on GitHub https://github.com/Grandt/PHPZip/issues/7#issuecomment-20678543.

datasheetarchive commented 2 years ago

For anyone trying to battle through this. Certain apps and text editors save your code with a Byte order mark - BOM. If you load into something like notepad++, select encoding on top menu and unselect with BOM it will resolve this issue.