Closed bijoythomas closed 9 years ago
This is because it's how compression works with gzip: to get small sizes, gzip needs to accumulate the payload so it can do substring search and replacements.
If you want to stream, you can use the res.flush()
command between writes, but the compression will be much less efficient.
An example can be found at the bottom of the readme.
Basically you are just describing to me exactly how gzip compression and the Node.js core zlib stream works and not actually a bug. Let me know if my answer above helps at all. If you think there is a legitimate bug, let me know the specifics and even perhaps a PR :)
Thanks. I added the call to flush the response after each write but it is still behaving the same. Everything seems to be coming only when res.end() is called. I am calling res.write() [and now res.flush()] in batches. After about 10 calls, I see this,
(node) warning: possible EventEmitter memory leak detected. 11 drain listeners added. Use emitter.setMaxListeners() to increase limit
Trace
at Gzip.addListener (events.js:179:15)
at Gzip.Readable.on (_stream_readable.js:671:33)
at Gzip.once (events.js:204:8)
at Gzip.Zlib.flush (zlib.js:451:10)
at ServerResponse.flush (/Users/bijoythomas/Zulu/SchedulingMVP/app_node/node_modules/compression/index.js:70:16)
Hi! The warning you are getting is a known and fixed Node.js core bug; if you upgrade Node.js to the latest, it should be fixed (or go back to 0.10, which I know was before that regression).
As for your issue, I'm not able to replicate it and all the tests running in Travis CI cannot, either.
If you can provide a PR for the fix or at least provide full code and instructions on how I can reproduce it, I can probably find some time in the next week to debug it. A lot of times people can have a proxy between them and their app, which can cause issues, so be sure you are running your server on localhost to narrow it down to this module. Another common issue is app.use of another module that is buffering your response, so you can try removing all other middleware to narrow it down.
Thanks for the suggestions. I am running on localhost with no proxies in between. So far, I can confirm that the difference comes down to one line: app.use(compression())
.
When commented, I can see the data streaming and the network tab in Chrome dev tools shows me the streaming information (sizes and time)
When enabled, I can see a long pause (Pending) and then finally the data coming in.
I will try to narrow it down to an example.
Hi @bijoythomas , you cannot use the Chrome network tab to make the measurements for GZIPed things, as far as I know. We get lots of issues in this repo from Chrome reporting fake things. I thought you said you were using Wireshark to verify this? Did you use Wireshark to verify on your most recent update or just Chrome?
Both. Wire Shark and Chrome.
Cool. Yea, I'm not able to reproduce for whatever reason, all the tests in Travis CI on all their Node.js versions get streaming compressed responses, etc., so I'm not sure how I can proceed without your help on trying to get me a reproducible case or just a PR to address the issue.
It's possible there is some kind of Express bug (I see you are using a version from June 2, 2014, well over a year old) or even some kind of bug with the Node.js you are using (from February). Have you also tried upgrading either of those two? I know I've only been testing with the latest versions from those lines to rule out any bugs (Express 4.13.3 and Node.js 0.12.7) that may be in them.
In addition, I know you said it was the addition of app.use(compression())
that caused it, but there has been several issues where another module was used that, when both of these modules were used together, this module triggered a bug in the other module. So if you haven't tried already, you still should try removing all other middleware to rule out the issue being a combination of this + another one. I don't have your application, so if there is an interaction like this going on, it could very well be another reason why I'm not able to reproduce the issue myself.
Without being able to see the issue, I'm not able to debug the issue and determine the case or a fix :(
Sure. I will try to narrow down the middleware and see if things change.
I did a little bit of debugging by adding console.log
statements in the compression/index.js file at several points in my node_modules
directory.
I see that in the res.flush()
function we do have a 'gzip' stream and it is getting called but nothing gets onto the wire.
I added another log statement in the res.end function in there there and once that gets called, is when the actual res.write
(actual http response object) is happening. I had added logging in the stream.on('data')
function.
I know its hard to understand w/o seeing where I put the logs, but essentially what I'm seeing is a bunch of Flush calls with no data on the wire, followed by the res.end()
calls and AFTER that I see the 'data'
event from the stream with the underlying res.write()
call happening and returning true/false and draining/resuming the stream. But like I said, all this activity seems to be happening only after the res.end()
call. It's like the flush has no effect.
Gotcha. In fact, from what you are describing, it sounds like you are describing a Node.js bug (perhaps just in 0.12.0).
Essentially all we do is whenever the Node.js code stream emits a data event, we call res.write
with that chunk. Every time you call res.write
, we call stream.write
to that Node.js core stream. When you call res.flush
we just call stream.flush
on that stream as well. If that doesn't cause the stream to emit a data
event, then we don't have anything to call res.write
with, of course.
So I figured out what is happening. The same issue exists in the last version of Node too. The way I am sending data is something like below
for (i-0; i<some_large_number; i++) {
poppulate_data_in_array(arr);
if (arr.length === 500) {
res.write(SON.stringify(arr));
res.flush();
arr = [];
}
// Finally if there is anything in array, send it across too
A look at the zlib.js source https://github.com/nodejs/node/blob/master/lib/zlib.js line 448 shows that the flush() call is not synchronous. Instead, it sets up a 'drain' listener to execute later. But with my for-loop consuming the thread, this listener has no chance to run. In fact, the loop will keep adding the listeners for ' drain' and that explains why I was getting the error below: (node) warning: possible EventEmitter memory leak detected. 11 drain listeners added
I changed my code to allow the execution of the listener by making use of the callback that the Zlib.flush() takes and trampoline between data generation and sending it. So it looks something like below:
function populateArray() {
if (iterCount < maxCount) {
arr.push(generate_some_data());
}
send();
}
function send() {
if (iterCount === maxCount) {
if (arr.length > 0) {
res.write(JSON.stringify(arr));
}
res.end();
return;
}
if (arr.length === 500) {
res.write(JSON.stringify(arr));
res.flush(function() { // <--------- callback to flush which on invocation resumes the array population
arr = [];
populateArray(++iterCount);
});
} else {
populateArray(++iterCount);
}
}
In order to get this to work, I had to change the flush implementation to take a callback
res.flush = function flush(cb) {
if (stream) {
stream.flush(opts.flush, cb);
}
}
Where opts is the variable that captures the options passed into the compression function. The Zlib flush implementation takes the flush mode but I'm not aware of a practical use case of using multiple flush modes for a single compression run. Or if there is one, the function could take the addition flush argument but the user has to make sure the right mode is passed in.
I'm using node v.0.12.0 with express (v4.4.1) and compression (v1.6.0)
I'm sending back about 80MB of dynamically generated data (not from FS or DB) in multiple
res.write()
calls. When I add the compression middle ware (with no options passed), I don't see any traffic (using WireShark) from the Node server until theres.end()
is called. Whenres.end()
is called, there is a sudden burst of chunked data.But when the compression module is not used, I see chunked responses going on the wire. The size of each of these chunked fragments seems to be between 10K and 16K (based on WireShark).
The only header information I am setting happens before the
res.write()
calls and is:Any reason why the data is getting buffered before the
res.end()
call?