expressjs / compression

Node.js compression middleware
MIT License
2.77k stars 241 forks source link

Fix usage of undocumented _implicitHeader #128

Closed maritz closed 2 weeks ago

maritz commented 6 years ago

As discussed in #127 this PR now only includes the changes that should not break node v0.8 and has tests.

Sadly supertest does not support http2 yet, so I had to resort to vanilla server/client creation.

I'm not destroying the client/server here, not sure if that's needed?!

maritz commented 6 years ago

Right, tests are of course failing, because the http2 module doesn't exist in previous versions. D'oh.

I'll add a try catch around the require, thus only running http2 tests when http2 is available. Is that acceptable?

maritz commented 6 years ago

Right, now the tests pass, but the travis config doesn't have node v9 in it, so no http2 tests are run there.

Am I allowed to just add that?

Edit: I just did... Deal with it. :-P

dougwilson commented 6 years ago

Yea, you can go ahead and update the Travis CI file. We just have one entry for the latest minor for each major. I'm not sure if any of the 8.x ones have the http2 module, but at least adding the most recent 9 would work. Make sure to validate all functionality works with 9.x, including no deprecation message are printed, for this and all our dependencies before adding it to Travis CI 👍 that is our standard Travis CI process.

maritz commented 6 years ago

There is one warning node currently gives when using http2: (node:4065) ExperimentalWarning: The http2 module is an experimental API.

As far as I can tell neither v8.8 nor v9.4 show any deprecation warnings.

dougwilson commented 6 years ago

Thanks for checking, @maritz . Can you verify that you ran every test suite of even dependency of this module throughout the tree with Node.js 9.x as well to check this? This module doesn't fully exersize all the features of our own dependencies because the assumption is that they already have test suites, so we don't want to copy-and-paste every test suite each module up. Many times the issue is only apparently when doing this, so we have to do this in order to bump any version in the Travis CI configuration 👍

maritz commented 6 years ago

Can you verify that you ran every test suite of even dependency of this module throughout the tree with Node.js 9.x as well to check this?

Do you mean going through all 12 dependencies, check out their git repo and manually run their tests? If so, no. I don't have the time to do that right now and quite frankly don't think that's a reasonable thing to do.

Or do you have some tool to do this? Or did I misunderstand?

dougwilson commented 6 years ago

Yes, that is the correct understanding. It's no problem at all, I can do it 👍 I just didn't want to if you already did it. I just use some shell scripts that use jq and git to cloen out the repos and then find + npm to run the tests through them in a single pass. We have been burned multiple times by not doing this, so it's a requirement in order to add new versions. We use the versions in Travis CI as the list of "verified compatible" versions, basically. We've only ever been burned on major upgrades, so this would be mainly for adding 9.x series.

dougwilson commented 6 years ago

Ok, I ran the tests of everything against Node.js 9.4 and updated the Travis CI in master 👍

maritz commented 6 years ago

Hm, not sure why the tests are suddenly failing on Travis. Works fine locally with 9.4 and worked fine on Travis before merging the master to resolve the conflicts.

dougwilson commented 6 years ago

Yea, I agree looks like nothing changed. Perhaps the test itself just has a race condition and it just happened to fail this time? If rerunning the test it suddenly passes, probably a flaky test that would need to be debugged.

maritz commented 6 years ago

Okay, now the tests passed. I'm hoping that these changes actually fixed it. Can't guarantee it though, since I don't truly understand what caused it in the first place.

Anything else I could do?

maritz commented 6 years ago

@dougwilson Could you manually trigger the travis checks for this PR a couple of times to see if it still fails occasionally? Not a great thing to have to do, but better be safe, right?

dougwilson commented 6 years ago

Yep, I can do that :+1: sorry for the delay, we're just trying to get out a couple security patches out asap. Jan seems to be the season of receiving security reports or something 😂

maritz commented 6 years ago

Sorry to pester you, but is there any update on this?

dougwilson commented 6 years ago

I'm very sorry, I totally forgot to follow up on your request to rerun the Travis CI job. I hit restart earlier today and it looks like it did indeed fail again as you suspected it may.

maritz commented 6 years ago

There were changes in node v9.8.0 that have the potential to maybe fix this... That's about all I got for this right now. :-(

maritz commented 6 years ago

I'm actually occasionally seeing this in the one app where I use this PR. I have no indication yet what causes it and it always takes the entire process down.

If anyone has any ideas on how to debug this, I'm very interested in it.

maritz commented 6 years ago

I added a fix for a change in node 10.4 onward that caused test timeouts every time. This allowed me to add node v10.11 to the travis.yml targets, providing one more data point for the CI tests. I also added a bunch of error logging to the http2 tests.

However after several local runs and 10 forced CI runs I have not seen a single failure. There were plenty of changes in node v10 and even some in the v8 and v9 lines on http2, especially about closing connections and such. This might mean it is fixed. I haven't seen the problem in my production app in a while either (using node v10), but I may have disabled http2 for unrelated reasons at some point, I'll check that later tonight.

@dougwilson Since I added node v10 in the travis.yml, could you re-run the dependency checks again? Maybe making this part of the testchain/tooling would be a good idea?

edit: I also rebased on master. :-)

edit2: As suspected I had disabled compression on my production http2 app, because it was causing problems and my reverse proxy/load balancer is handling compression there. I'm re-enabling compression on the app now, just to test it a bit.

maritz commented 6 years ago

Sadly this is still happening, had to revert to the proxy doing compression again.

Error [ERR_HTTP2_INVALID_STREAM]: The stream has been destroyed,
    at Http2ServerResponse.write (internal/http2/compat.js:599:19),
    at Gzip.onStreamData (/usr/src/app/node_modules/@maritz/compression/index.js:206:20),
    at Gzip.emit (events.js:182:13),
    at addChunk (_stream_readable.js:283:12),
    at readableAddChunk (_stream_readable.js:264:11),
    at Gzip.Readable.push (_stream_readable.js:219:10),
    at Gzip.Transform.push (_stream_transform.js:151:32),
    at Zlib.processCallback (zlib.js:591:10)

Not sure if it's actually the same error as we had before anymore, I can't remember. I'll maybe try to see how to reliably reproduce it and add a failing test. Otherwise this PR is still a no-go.

maritz commented 6 years ago

Just to clarify: The code changes in this PR should be completely fine - they just change _implicitHeader to writeHead.

There is just some problem with how compression works over http2. If you want, I can make yet another PR for just this code change. I don't know if _implicitHeader on http1 is going away anytime soon, so there is probably no hurry.

DullReferenceException commented 5 years ago

@maritz, @dougwilson: need any help with this PR? I'd definitely prefer using this over rolling our own compression support for HTTP2.

maritz commented 5 years ago

If you can figure out what is causing the seemingly random failures, that would be great.

I don't have the time and honestly probably lack understanding of how compression and http2 work to fix this properly.

DullReferenceException commented 5 years ago

@maritz: do you have any hints on what your app does that might be provoking those failures? So far, I've been unable to reproduce that failure. It may be that the version of node 10 is different than what you were using. I see that some fixes (such as https://github.com/nodejs/node/pull/23146) have been made to HTTP2 since you saw this happening.

maritz commented 5 years ago

do you have any hints on what your app does that might be provoking those failures?

Not really, it was just failing after X amount of time working fine. It's a next.js server that also serves a few static files. So nothing too fancy.

But the Travis CI also had failures sometimes. So I really don't know how to approach this.

DullReferenceException commented 5 years ago

I'll be doing a bunch of testing of your published fork within our app soon; maybe that will be sufficient validation. Will report back.

sogaani commented 5 years ago

With current commit, I could not reproduce the error @maritz reported. But I figured out 2 possible reasons that caused the error.

server.on('listening', function () { var client = createHttp2Client(server.address().port) var request = client.request({ 'Accept-Encoding': 'gzip' }) request.on('response', function (headers) { closeHttp2(request, client, server, ()=>{}) }) request.end() })


* Second possible reason (The issue found with next.js)
When the server returns 304(not modified) or 204(no content), the error possibly happens with Compression.
When server returns statusCode 304 or 204, node http2 module [starts to close stream](https://github.com/nodejs/node/blob/3c84556654fb81d73a61b9d19bba3df6aac94e0e/lib/internal/http2/core.js#L2472-L2479). After that, seemingly call res.end() is safe. However, with compression, res is wrapped with Zlib stream and Zlib stream emit data even thought empty input (which is not a bug). So that, a server calls res.end() which is wrapped Zlib stream, eventually calls res.write(chunk) and it possibly causes the error.

    With http1, If already set statusCode as 304 or 204, http_outgoing_message.write(chunk) checks if it has body and [do nothing] (https://github.com/nodejs/node/blob/master/lib/_http_outgoing.js#L619-L624).
With http2 compat layer, just [throw an error](https://github.com/nodejs/node/blob/master/lib/internal/http2/compat.js#L636-L643).

    next.js [serves static file](https://github.com/zeit/next.js/blob/340594398e3f946b174269f72cc7437014e198ea/packages/next-server/server/serve-static.ts#L10) with [send package](https://github.com/pillarjs/send). Send package check if file is cached and return 304(not modified) and  [call res.end()](https://github.com/pillarjs/send/blob/de073ed3237ade9ff71c61673a34474b30e5d45b/index.js#L375-L381).

  100% reproduce this issue with the following code.

var server = createHttp2Server({ threshold: 0 }, function (req, res) { res.setHeader('Content-Type', 'text/plain') res.writeHead(304) setImmediate(()=>{res.end()}) })

server.on('listening', function () { var client = createHttp2Client(server.address().port) var request = client.request({ 'Accept-Encoding': 'gzip' }) request.end() })



Fixing the second issue, I think there are two approaches.
A. check the status code in this package.
B. check the status code in nodejs http2 compat layer.
I prefer approch B, because of keeping compatibility with http1.

Any thoughts?
wood1986 commented 4 years ago

do you have any ETA to get this merged?

maritz commented 4 years ago

I have re-enabled compression on my server again with a current node version (v12.14.1) and have performed some long-running load tests on a locally running next.js server serving html (200 status code) with >1m of compressed requests. I have also done a lot of manual loading of 200 and 304 requests with gzip compression on.

So far no crashes. I don't know if something changed in node or whether I've just been unlucky in reproducing it this time. I vote for merging and then dealing with potential issues if they arise.

Since all failures I've personally seen were with http2 anyways - not http1 - this would still be an improvement over the status quo of no one really using compression over http2 in the express-based world.

Otherwise, if anyone is interested in testing it without an official merge, you can use '@maritz/compression' from npm and report back how it works out for you.

maritz commented 4 years ago

At the time, I was hoping that @sogaani 's comments would be answered prior to merging

Yeah, I'll try to see what I can do about that. I obviously can't really make a decision regarding whether to check for 304 in the module or in node compat. But I can do some more tests with old node versions etc. to see if the 100% reproducible cases can be reasonably added to theses tests.

wanted to note I do have a "rebase" label on this issue, as it is currently merge conflicts preventing merge

My bad, I saw that at some point and then forgot about it later on. :-D

Regarding the test reviews: Yep, those logs and no-ops are suboptimal. I think I added the logs simply for the case that we do see the unexpected failures and they could be caught with this. But it's been a long time, I'll go over all the tests and make sure they are done properly.

Icehunter commented 4 years ago

Can someone update this or get another one open to achieve the same results? It's been a known issue for quite some time and prevents using http2 with server framework (spdy, http2, nextjs, etc).

Separately is there any reason to keep supporting older engines with serious security risks?

"engines": {
  "node": ">= 0.8.0"
}
dougwilson commented 4 years ago

Hi @Icehunter I believe that the OP is working to update the PR. The things left have nothing to do with old Node.js versions; they are mainly about the tests included here to get them updated. If you publish a module you're welcome to choose whatever Node.js version support policy you would like. The versions supports are the ones I care about, and changing them at that point would be a fork since I would still need a module that suited my needs. I see no blocker in having it support all of them and also not use the above property when not needed.

maritz commented 4 years ago

If someone else wants to take over, of course they are free to do so. I can't promise any time frame about when I'll get around to this, sorry.

jonchurch commented 4 years ago

@maritz You can check a box I believe bottom right of the PR to allow maintainers to push to this PR, if youd like to leave the work open to updates.

maritz commented 4 years ago

@jonchurch Thanks, already checked by default, it seems. 👍

Icehunter commented 4 years ago

@maritz @jonchurch I can't rebase this PR however https://github.com/expressjs/compression/compare/master...Icehunter:bugfix/use_writeHead_instead_of_implicitHeader I did fork @maritz 's fork; and then rebased from express master. How can I help with this?

jonchurch commented 4 years ago

@Icehunter You can either submit a PR to OP's branch https://github.com/maritz/compression/tree/bugfix/use_writeHead_instead_of_implicitHeader or create a new PR here

lukasoppermann commented 4 years ago

Hey, I am running into this issue as well.

It seems @Icehunter has this all already addressed apart from one edge case? Is there a way to help here and make this merge happen, as we are otherwise at a loss if we don't use express. 😢

lamweili commented 2 years ago

Superseded by #170.

bjohansebas commented 2 weeks ago

Close in favor of #170