Kitura / Kitura-WebSocket

WebSocket support for Kitura
Apache License 2.0
68 stars 30 forks source link

Notify websocket service on abnormal socket closure #61

Closed bikram990 closed 6 years ago

bikram990 commented 6 years ago

This pull request fixed issue #60

CLAassistant commented 6 years ago

CLA assistant check
All committers have signed the CLA.

ianpartridge commented 6 years ago

Thanks for your PR! Before we review it, could you please sign the CLA? (link above). Also, your PR seems to have failed CI, can you take a look?

bikram990 commented 6 years ago

Hi @ianpartridge,

I've fixed the CI build failure. While looking at it I found another issue. I've updated the issue #60 with same.

The CI build again failed due to a timeout issue.

[INFO] [HTTPServer.swift:319 listen(listenSocket:socketManager:)] Server has stopped listening
/Users/travis/build/IBM-Swift/Kitura-WebSocket/Tests/KituraWebSocketTests/KituraTest.swift:67: error: -[KituraWebSocketTests.BasicTests testBinaryMediumMessage] : Asynchronous wait failed: Exceeded timeout of 10 seconds, with unfulfilled expectations: "BasicTests:69[0]".
/Users/travis/build/IBM-Swift/Kitura-WebSocket/Tests/KituraWebSocketTests/KituraTest.swift:66: error: -[KituraWebSocketTests.BasicTests testBinaryMediumMessage] : XCTAssertNil failed: "Error Domain=com.apple.XCTestErrorDomain Code=0 "(null)"" - 
Test Case '-[KituraWebSocketTests.BasicTests testBinaryMediumMessage]' failed (10.113 seconds).

This is the first time I'm submitting a PR to any opensource repository. Could you please help/guide me to fulfill the requirements for PR?

Thanks

bikram990 commented 6 years ago

@Andrew-Lees11 Did you get a change to review this PR?

Andrew-Lees11 commented 6 years ago

Hello @bikram990 Sorry for the delay. I have been looking over this PR today. Firstly thank you for raising the issue and PR. You have caught some good bugs. There are currently two things with your PR I'm looking into:

  1. You have added a test which is good. The test passes even without your changes though so It isn't covering your problem case.
  2. connectionClosed calls processor?.close(). This would cause a loop if you didn't set the processor as nil and I wanted to check if there is a better way to do this.

Could you expand on what is happening when you get your issue? Are you using master branch with a connection timeout or latest release? Have you tested this PR in your case and did it fix your issue?

bikram990 commented 6 years ago
> You have added a test which is good. The test passes even without your changes though so It isn't covering your problem case.

I've not added any new test case, I've modified the existing test cases. One issue with test case was that the connection was closed abnormally and it was expecting to get a noReason error code. Second issue with test case was specific to macOS. In Xocde the tests were giving false results even though test was failed. Third issue is related to timeout. The timeout issue may be because of network issues on the build server or no delay before closing the socket. I've not fixed the timeout issue But would like to have a fix for the same.

connectionClosed calls processor?.close(). This would cause a loop if you didn't set the processor as nil and I wanted to check if there is a better way to do this.

I just followed the same pattern used in Kitura-Net. Personally I would add a new delegate method for this. But for this PR I wanted to keep my modifications as minimum as possible.

Could you expand on what is happening when you get your issue? Are you using master branch with a connection timeout or latest release? Have you tested this PR in your case and did it fix your issue?

For my project, I'm using master branch without any timeout. The client for my server was killed forcefully and my server was never notified for the same. As my server was not notified, the server was still trying to send messages to client.

Andrew-Lees11 commented 6 years ago

Thank you for your comprehensive response. We have seen the timeout issue on travis before and I'm not to worried about that making it fail.

Looking at the code a agree that a delegate would probably be better but your solution will ensure that if the close function is called the service.disconnected should also be called.

Have you run your project with your branch? If the client never notifies the server then i'm not sure close will be called. Setting a connection timeout makes the server clean up connections that are not responding and should call service?.disconnected so your service is aware the client has gone.

bikram990 commented 6 years ago

@Andrew-Lees11 Yes, I've run my project with above changes and it works. Basically in Kitura-Net when a socket is open it creates a dispatch read source. This dispatch source gets a cancel message when ever a socket is closed abnormally. In implementation of cancel it closes the socket from server side and notifies its delegate (WebSocketProcessor) about close.

I agree a timeout will make the server clean up stale connections. But for timeout period server is using more resources than it should. The above change makes it event based.

Andrew-Lees11 commented 6 years ago

Could you also add the following to the connection cleanup tests:

("testProccessorClose", testProccessorClose),
    func testProccessorClose() {
        let service = register(closeReason: .closedAbnormally, connectionTimeout: nil)

        performServerTest() { expectation in
            XCTAssertEqual(service.connections.count, 0, "Connections left on service at start of test")
            guard let socket = self.sendUpgradeRequest(toPath: self.servicePath, usingKey: self.secWebKey) else { return }
            let _ = self.checkUpgradeResponse(from: socket, forKey: self.secWebKey)
            usleep(1500)
            XCTAssertEqual(service.connections.count, 1, "Failed to create connection to service")
            let connections = Array(service.connections.values)
            connections[0].processor?.close()
            usleep(1500)
            XCTAssertEqual(service.connections.count, 0, "Service was not notified of connection disconnect")
            socket.close()
            usleep(150)
            expectation.fulfill()
        }
    }
Andrew-Lees11 commented 6 years ago

So I have a few changes I would like you to make to the PR and then we should be good to merge. Do you agree with the change to only call connectionClosed if the connection is active? This then leaves the removal of the processor to prepareToClose and removes the loop. I have also made a test which covers this new behavior to measure the change.

I have tried this on a local version of websocket client and It fixed my replication of your issue so hopefully it should also work for you.

Thank you again for finding this problem and working with me towards a solution.

bikram990 commented 6 years ago

@Andrew-Lees11 I've made the suggested changes. Could you please review?

pushkarnk commented 6 years ago

I agree with the proposed fix.