Atmosphere / wasync

WebSockets with fallback transports client library for Node.js, Android and Java
http://async-io.org
161 stars 47 forks source link

Upgrade async-http-client. #151

Closed seamusmac closed 2 years ago

seamusmac commented 7 years ago

Ran into this bug https://github.com/AsyncHttpClient/async-http-client/issues/891

So tried upgrading the async-http-client version.

Probably not ready to submit yet. Submitting PR to run the tests to see how it looks.

seamusmac commented 7 years ago

@jfarcand So there is no java8 build on Travis for this yet. Any chance it could be added.

jfarcand commented 7 years ago

@seamusmac Working on it...need to remember this this work :-) BTW Would you like to becomes a contributor? You proved many times you can be trusted and I would really like you to help!

seamusmac commented 7 years ago

@jfarcand Thanks. Not really sure I have time to become a contributor at the moment. 3 kids, family, startup company etc. Always happy to try and fix things, or attempt something like this the odd time :)

Regarding this PR. I have upgraded to the latest Nettosphere snapshot (updated netty version in nettosphere) to try and get all the netty version in sync, and using the latest AHC version now.

There is one class not compiling at the moment, and it is the WebSocketTransport class, it extends AHCs WebSocketUpgradeHandler, cos some of the methods have been marked final. (onBodyPartReceived, onStatusReceived, onHeadersReceived, onCompleted, onSuccess, onFailure) I think we need to replace this with the AHC WebsocketListener, but I'm not totally sure, cos the old method don't map exactly etc.

If you had time to have a look, that would be awesome.

jfarcand commented 7 years ago

OK I understand...this is quite bad that compatibility isn't there. Will ping @slandelle for more info :-) as it won't work with WebsocketListener

slandelle commented 7 years ago

Those methods are final on master because someone forgot to port his commit on the master branch back then: https://github.com/AsyncHttpClient/async-http-client/commit/4cd825805189e38c5a0d45f7acc34e4c585ac0e5 ;-)

I have mixed feelings about making those non final. wasync is probably the only one overriding them (first time I get a complain about those) and I'm afraid of opening a Pandora box: everything would break down if they were to return something else than what's implemented in WebSocketUpgradeHandler.

IMHO, it would be way better to add protected hooks that would return void and whose default implementations would do nothing. Contributions welcome!

jfarcand commented 7 years ago

@slandelle OK ... but no time for me to spend on AHC so we may need instrument the classes...no time either :-) so let's put the migration on hold unless @seamusmac you have the cycle :-)

seamusmac commented 7 years ago

@jfarcand

So I tried a little more at getting this to work, just to see how wasync tests go, etc. I checked out the AHC project, modifed those methods that are final to allow the override ( we can patch this part later, depending on what @slandelle agrees on)

I had 36 tests fail after this upgrade. I think I found a fix for all the WebSocketTests, in Nettosphere latest https://github.com/Atmosphere/nettosphere/blob/master/server/src/main/java/org/atmosphere/nettosphere/BridgeRuntime.java#L418 I change that line to binaryData.copy().readBytes(body);
The AtmosphereResource didn't have the message cos the ByteBuf was already read I think.

There are other test failures in the StreamTransportTests, for example here is a log from one: https://gist.github.com/seamusmac/8ffd9e194684b3018f91c3e32055a3d0#file-gistfile1-txt-L1564

I tried a similar fix with the the BridgeRuntime.handleHttp method but didn't help. Don't have time to look more into it this week. Will see if I can possibly look next week.

thabach commented 7 years ago

@slandelle where you envisioning something like this for the WebSocketUpgradeHandler:

    // protected template methods for processing logic extension 
    protected void do_onStatusReceived(HttpResponseStatus responseStatus) throws Exception {}
    protected void do_onHeadersReceived(HttpHeaders headers) throws Exception {}
    protected void do_onBodyPartReceived(HttpResponseBodyPart bodyPart) throws Exception {}
    protected void do_onCompleted() throws Exception {}
    protected void do_onThrowable(Throwable t) {}
    protected void do_onOpen() {}

    @Override
    public final State onStatusReceived(HttpResponseStatus responseStatus) throws Exception {
        this.do_onStatusReceived(responseStatus);
        return responseStatus.getStatusCode() == SWITCHING_PROTOCOLS ? State.CONTINUE : State.ABORT;
    }

@seamusmac I see you prepare for a fix, would that fit your override needs ? @jfarcand do you see any obstacles in going that way ?

seamusmac commented 7 years ago

@thabach I think that would do fine. Lets see what the others have to say :)

jfarcand commented 7 years ago

+1 for me as well. Great work!

slandelle commented 7 years ago

@thabach Yes. But please name the hooks as onStatusReceived0 (zero) and don't use this. when it's not necessary. PR welcome.

thabach commented 7 years ago

@slandelle ok, shall do, whatever your preference. I like to use this as an aid in comprehension that this call is going to be a dynamically dispatched call, but no worries 😉 their gonna be gone in the pull request.

thabach commented 7 years ago

pull-request done: AsyncHttpClient/async-http-client#1446

seamusmac commented 7 years ago

I fixed up those methods. Redid the PR for nettosphere and added a comment at: https://github.com/Atmosphere/nettosphere/pull/129

let me know what you think.

I think all that is left is to fix the tests, cleanup the HttpHeaders a little, and possibly get rid of the FluentStringsMap I added that was removed in the AHC 2+ onwards.

thabach commented 7 years ago

@seamusmac I found out that simple tests like the LongPollingTest::basicWebSocketTestsuffer from double socket:on("message", ...) callbacks after the necessary adjustments in upgrading. With the first callback carrying the expected string and the second callback reporting an empty message string, which overrides the initial value and leads to the test failure (and fails many others in a similar "second callback overwrites initial value" - fashion). Before the upgrade (and adoptions of code) there was only one callback, i.e. the first we see now, but no consecutive, unexpected, additional callbacks.

@jfarcand, @slandelle does that ring a bell by any chance ?

seamusmac commented 7 years ago

@thabach I seen the same with StreamTransportTests too. I posted the full TRACE log from StreamingTest.basicWebSocketTest() test above. Here is it again, this line looks v. suspicious.

https://gist.github.com/seamusmac/8ffd9e194684b3018f91c3e32055a3d0#file-gistfile1-txt-L1564

@jfarcand I wonder are these problems in Nettosphere V3, That is the other upgrade I performed here, to get netty versions relatively in sync. How many people are using Nettosphere V3? I tried an upgrade once and failed.... :( I could be completely wrong though.. May need to try again

thabach commented 7 years ago

@seamusmac please try the latest nettosphere v3 snapshot of this morning, I tried to get it all in sync in regards to netty version used by both AHC and nettosphere.

thabach commented 7 years ago

@slandelle I can get 80% of the test pass if I adjust the org.asynchttpclient.netty.handler.HttpHandler to not have the last chunk propagate. There is a line that reads // Netty 4: the last chunk is not emptyin the handleChunks(...)-method. For me the last chunk for the exercised tests in our test suite is always empty.

Please advise...

jfarcand commented 7 years ago

@seamusmac No idea how many people are using Nettosphere 3 (but I do have customer) but it could potentially be the issue. Hard to say... I would need to look into it but right now it is totally impossible for me to work on that :-( Possibly in September once we have lunched

thabach commented 7 years ago

@slandelle actually it is even worse, I see the last chunk entering the org.asynchttpclient.netty.handler.HttpHandler over and over again, endlessly... BUT some hours later into debugging the problem seems to be with us after all 😉😇 @seamusmac something with the handling of completing Futures in wasync seems to be broken in this pull-request and results in propagating the last chunk endlessly @jfarcand on another note, remind me for wasync with non-binary payload, do we expect to receive aggregated messages only in socket:on("message", ...)

thabach commented 7 years ago

@seamusmac as of changes in async-http-client in regards to the propagation of the terminating chunk (which have an empty body) and of it being pushed into AsyncHandler::onBodyPartReceived(...) nonetheless for signalling the latter (at least this is the reason for pushing it from what I presume, @slandelle maybe you could give us the rational), we should add a precondition check in all classes and their respective onBodyPartReceived(...)derived method, like this :

if (bodyPart.isLast()) { return AsyncHandler.State.ABORT; }

after I did that, I have still some tests failing, namely these

Failed tests: 
  status404FunctionTest(org.atmosphere.tests.LongPollingTest): expected [404] but found [200]
  status404FunctionTest(org.atmosphere.tests.StreamingTest): expected [404] but found [200]
  basicDecoderTest(org.atmosphere.tests.WebSocketsTest): expected object to not be null
  basicEchoEncoderTest(org.atmosphere.tests.WebSocketsTest): expected [<-echo->] but found [null]
  encodersChainingTests(org.atmosphere.tests.WebSocketsTest): expected [<-echo->] but found [null]
  eventDecoderTest(org.atmosphere.tests.WebSocketsTest): expected object to not be null
  multipleFireBlockingTest(org.atmosphere.tests.WebSocketsTest): expected [PINGPONG] but found []
  multipleFireTest(org.atmosphere.tests.WebSocketsTest): expected [PINGPONG] but found []
  postTest(org.atmosphere.tests.WebSocketsTest): expected object to not be null
  testTimeoutAtmosphereClient(org.atmosphere.tests.WebSocketsTest): expected [true] but found [false]
  defaultTypedTest(org.atmosphere.tests.TypedTest): expected object to not be null
  predefinedMessageTest(org.atmosphere.tests.TypedTest): expected object to not be null

If you happen to have some time contributing into fixing those remaining test, that would be very appreciated.

Cheers, Christian

thabach commented 7 years ago

PS: the Future chaining proved fine and the endless propagation of the last chunk, were pretty much instant replies of subsequent GETs in long-polling (from the embedded nettosphere instance). Your suggested Nettosphere "fix" made sure that the subsequent GETs were returning the same buffer again (which is of course is not what we generally want and why I reverted that change).

thabach commented 4 years ago

@slandelle - working on the upgrade of wasync to an up-to-date AsyncHttpClient version anew ðŸĪŠ - may I get your thoughts on the following in your code base:

image

In case of this being the last chunk (a chunk of 0<CR><LF>) with an optional trailer, carrying additional header fields, which you propagate nicely in handleChunk() right before the screenshot-ed lines above, why do you (caused by || last) additionally choose to propagate the empty content-buf ?

It is not really a ResponseBodyPart, and is not expected by some of the wasync unit tests and make it fail. Is it done to signal the receipt of the last chunk ?

thabach commented 4 years ago

@seamusmac could you alter your pull request to prevent those last chunks from propagating in altering all our onBodyPartReceived overrides with a conditional clause on exercising the up-call, like so:

Screenshot 2019-10-21 at 10 46 08

this should make most (if not all) of the unit tests pass. Happy to hear how that goes - cheers Christian.

slovdahl commented 2 years ago

@seamusmac Any idea of how much work is left on this before it would be ready for merging? I'm a bit interested in getting it moved forward.

seamusmac commented 2 years ago

@slovdahl I haven't looked at this in quite a long time, and don't have the time currently to pick it up again. I pushed the last changes I had made based on @thabach replies above. There are currently 8 tests failing out of 117.

Failing Tests: basicIOExceptionTest(org.atmosphere.tests.LongPollingTest): Connection refused: no further information: /127.0.0.1:49289 status404FunctionTest(org.atmosphere.tests.LongPollingTest): expected [404] but found [200] serializeFutureGetTest(org.atmosphere.tests.LongPollingTest): expected [PINGPONG] but found [PINGPONG status404FunctionTest(org.atmosphere.tests.StreamingTest): expected [404] but found [200] basicConnectExceptionTest(org.atmosphere.tests.WebSocketsTest): Connection refused: no further information: /127.0.0.1:58835 basicIOExceptionTest(org.atmosphere.tests.WebSocketsTest): Connection refused: no further information: /127.0.0.1:59470 serverDownTest(org.atmosphere.tests.WebSocketsTest): Connection refused: no further information: localhost/0:0:0:0:0:0:0:1:815 testTimeoutAtmosphereClient(org.atmosphere.tests.WebSocketsTest): Connection refused: no further information: /127.0.0.1:62768

Last I remember trying to fix was the status404FunctionTest ones, and I think that requires a fix in Nettosphere, cos the GET requests are returning 200 Ok, when they should be returning a 404. Nettosphere just logs at https://github.com/Atmosphere/nettosphere/blob/9d6f024a471f188a20d5669cbff20c1186c79aa7/server/src/main/java/org/atmosphere/nettosphere/BridgeRuntime.java#L690, and no error happens afterwards. That should fix 2 more. and the rest I'm not sure about.

Feel free to fix/modify/take the code if you want to give it a try.

slovdahl commented 2 years ago

Thanks @seamusmac, that's very helpful!

don't have the time currently to pick it up again

No worries, a huge thanks for the effort you've put into it this far :bow:

I'll put some effort into it now and see if I can get any further with the test failures.

jfarcand commented 2 years ago

@seamusmac @slovdahl I've made some progress, at least with the timing issue of IOException.

seamusmac commented 2 years ago

@jfarcand @slovdahl That brilliant that you made some progress on this. I pulled your code @jfarcand and gave it a quick run here locally on my windows machine.

Looks like only one test failing now. timeoutTest(org.atmosphere.tests.WebSocketsTest): expected [OPENCLOSEERROR] but found [OPENCLOSEERRORERROR]

I also left a comment on another TODO that I had put in, not sure the best way to handle those.

jfarcand commented 2 years ago

@seamusmac I don't see your commit? Maybe you haven't pushed it on the proper branch? I still see some random failure but I'm tempted to merge your pull request and continue on the master branch. I will wait for your commit first.

seamusmac commented 2 years ago

@jfarcand sorry I don't have any other commit, or time to look at this in detail. Was just informing you about the TODO around the IOExceptions on the asyncclient.close() methods above.

Feel Free to merge this PR if you wish, and fix those niggly bits yourself. You'll have it done in a few mins where it would take me 10 times as long to figure out.

Seeing as this is a major change, would it be a good idea to do a major version bump for wasync?

jfarcand commented 2 years ago

@seamusmac Yes, major version for sure!

jfarcand commented 2 years ago

@seamusmac I've released 3.0.0 :-) The test are flacky as they all pass locally, so I went ahead and cut 3.0.0. Let me know if you are facing any issue.

slovdahl commented 2 years ago

Wow, thank you @jfarcand! :bow: This is awesome.