brimworks / lua-zlib

Simple streaming interface to zlib for Lua.
273 stars 111 forks source link

gzip format lost #4

Closed mingliu closed 11 years ago

mingliu commented 12 years ago

I used it as following local zlib = require("zlib") local inflate_stream = zlib.inflate() local inflated_body = inflate_stream(body)

        local reverted_rbody, _ = string.gsub(inflated_body, "AAA", "BBB")
        local deflate_stream = zlib.deflate()
        body = deflate_stream(reverted_rbody, 'full')

I want to unzip it, replace something, then zip it again. But the re-compressed content got another format, that was not a gzip again

brimworks commented 12 years ago

Hi mingliu,

There are some obscure options to zlib to pick between the various formats. Specifically, it sounds like you are interested in using the deflateInit2() windowBits parameter as documented:

The windowBits parameter is the base two logarithm of the window size
(the size of the history buffer). It should be in the range 8..15 for this
version of the library. Larger values of this parameter result in better
compression at the expense of memory usage. The default value is 15 if
deflateInit is used instead.

windowBits can also be -8..-15 for raw deflate. In this case, -windowBits determines the window size. deflate() will then generate raw deflate data
with no zlib header or trailer, and will not compute an adler32 check value.

windowBits can also be greater than 15 for optional gzip encoding. Add
16 to windowBits to write a simple gzip header and trailer around the
compressed data instead of a zlib wrapper. The gzip header will have no
file name, no extra data, no comment, no modification time (set to zero), no
header crc, and the operating system will be set to 255 (unknown). If a
gzip stream is being written, strm->adler is a crc32 instead of an adler32.

Note that you still have information loss since the gzip header specifies file name, "extra data", comment, modification time, header CRC, and operation system. If you want to add support for extracting this meta-data and inserting it into a deflate stream, that sounds great (but isn't necessary for my purposes).

...or an easier change is to make it so zlib.deflate() takes an optional windowBits parameter that is forwarded in to deflateInit2().

Thanks, -Brian

mingliu commented 12 years ago

Great, thanks! gzip header is enough for me

bhargavtrivedi commented 10 years ago

Hi,

Can you just paste your final code which could work for you? I am trying to unzip the content and then replace some thing and then zip it but giving content encoding error.

Thanks

brimworks commented 10 years ago

zip is different than gzip. Maybe you want this library instead? https://github.com/brimworks/lua-zip

bhargavtrivedi commented 10 years ago

Thanks for your quick response. It was my mistake I am trying to decompress gzip content.

So I have gzip response which I want to decompress and change it and then compress(gzip) it again.

bhargavtrivedi commented 10 years ago

Hello,

Can you please provide me example code for gzip compression ? I can gunzip content and modify it but can not compress it back to gzip.

Thanks,

mingliu commented 10 years ago

Hi This the replacement I used in lua_zlib.c to generate gzip header lz_assert(L, deflateInit2(stream, level, Z_DEFLATED, (15+16), 8, Z_DEFAULT_STRATEGY), stream, FILE, LINE);

bhargavtrivedi commented 10 years ago

Hi mingliu,

Thanks for your reply on this.

I have just replaced below line of code in lz_deflate_new(lua_State *L)

int result = deflateInit2(stream, level, Z_DEFLATED, window_size, DEF_MEM_LEVEL, Z_DEFAULT_STRATEGY);

with int result = deflateInit2(stream, level, Z_DEFLATED, (15+16), 8, Z_DEFAULT_STRATEGY);

It worked and generated gzip headers(tested with gzip file , decompress it then modified and compressed it back with gzip headers).

But I am getting below error when I use lua-zlib with Nginx lua module

InvalidInput: input string does not conform to zlib format or checksum failed at lua_zlib.c line 170

And browser can not display complete page , some part (footer part) of the page is missing.

Thanks,

mingliu commented 10 years ago

Hi bhargavtrivedi I'm not sure about how you use lua-zlib and what's the input and output. I used it as simple as my first comment, and it was a year ago and worked with my browser unless you forgot to indicate ngx.header.content_encoding = "gzip" before output. As Brian's long comment, the simply generated gzip header seems to lose meta-data. So it may be the cause of your problem. From the repository's commit history, it seems customized window_size can be a input. stream = zlib.deflate([ int compression_level ], [ int window_size ]) You may have a try.

brimworks commented 10 years ago

Ya, per mingliu's comment you can now pass in the window_size (or windowBits) to zlib.deflate()... and as mentioned by mingliu, you just need to add 16 + the base 2 window size in order to get gzip headers... alternatively if the number is negative then it indicates raw headers (per the comment I pasted from the zlib.h header file). I know this is all rather obscure :(. I'm not sure why nginx's lua-zlib won't work... are you sure it has the latest lua-zlib code?

brimworks commented 10 years ago

Note that the windowBits passed into zlib.inflate() (as opposed to zlib.deflate()) are also used to determine if the input is expected to be in zlib, gzip, or "raw" format. Are you passing in a windowBits to that function and if so, what are the windowBits your passing in? The default of "detect" gzip headers is probably best IMHO.

bhargavtrivedi commented 10 years ago

Hi,

Here is my nginx code which I have used to modify content using lua.

local zlib = require("zlib") local stream = zlib.inflate() ngx.arg[1] = stream(ngx.arg[1])

ngx.arg[1] = string.gsub(ngx.arg[1], "SEARCH", "REPLACE")

local deflate_stream = zlib.deflate() ngx.arg[1] = deflate_stream(ngx.arg[1], 'full')

So I am not using any argument in zlib.inflate() or zlib.deflate().

As suggested by mingliu, I have made one change in lz_deflate_new(lua_State *L) code in lua_zlib.c file which I have mentioned in previous post.

With the above change in lua_zlib.c I can get the content-encoding="gzip" header and page loads in browser too but some how only half of the HTML page is displayed .

Thanks,

brimworks commented 10 years ago

I would not recommend making changes to lua_zlib.c, but rather to pass in the window bits as an argument to deflate(). I suspect the issue your having has to do with you not calling deflate_stream() with the "finish" argument... so code like this should work (without having to modify lua_zlib.c):

local zlib = require("zlib") local stream = zlib.inflate() ngx.arg[1] = stream(ngx.arg[1], "finish")

ngx.arg[1] = string.gsub(ngx.arg[1], "SEARCH", "REPLACE")

local deflate_stream = zlib.deflate(zlib.BEST_COMPRESSION, 15+16) ngx.arg[1] = deflate_stream(ngx.arg[1], "finish")

bhargavtrivedi commented 10 years ago

Hi,

I tried with the "finish" argument in both inflate and deflate stream as you suggested but the result is same (Browser displays only half content of page).

I checked the log and found that it seems to be issue with inflate stream not with the deflate. Below error is logged for each requests.

InvalidInput: input string does not conform to zlib format or checksum failed at lua_zlib.c line 170 stack traceback: [C]: in function 'stream'

I can see the Content-Encoding=gzip header and this time I have not changed the lua_zlib.c instead of that I have passed the argument to deflate() as you suggested above.

Thanks,

brimworks commented 10 years ago

Have you tried removing this line just to make sure that this is indeed a problem with lua-zlib?

ngx.arg[1] = string.gsub(ngx.arg[1], "SEARCH", "REPLACE")

...some other things I would try is using a packet sniffer (such as tcpflow) to verify that there isn't something else wrong. For example, HTTP can either be "chunk" encoded or specify an explicit Content-Length... if the Content-Length was not updated after the gzip then the browser could be doing a "short read"... if "chunk"ed then the special "0" sized chunk that indicates the end could be prematurely inserted.

So, basically I'm suggesting ways to make sure that the blame is really with lua-zlib :).

Thanks, -Brian

bhargavtrivedi commented 10 years ago

Hi,

Yes I tried with removing that string.gsub line too. I also tried to print inflated stream as error in log file and also noticed that it prints the half HTML content in log too.

Yes Content-Length can be the problem but I set it to nil so it should not be the issue.

I have tried to process (decompress then modify then compress again) the same compressed HTML file and it works.

But when I use same code (lua-zlib) with Nginx it doesn't work. If I don't use gzip compression then it works just fine.

Yes HTTP response can be in multiple chunks , How does inflate/deflate work in that case?

I also tried with initializing inflate stream separately (only once for multiple chunks) but in that case I got same result with below error.

IllegalState: calling inflate function when stream was previously closed stack traceback: [C]: in function 'stream'

Thanks, Bhargav

brimworks commented 10 years ago

Can you give me some sample input to deflate that causes this partial compression? Can you put it in a "gist"? https://gist.github.com/

Thanks, -Brian

bhargavtrivedi commented 10 years ago

Hi,

Finally the issue is not with lua-zlib, it was because of chunk data.

Thanks for you help on this.