Kitura / Kitura-net

Kitura networking
Apache License 2.0
104 stars 79 forks source link

Errors in removeIdleSockets #239

Open bridger opened 6 years ago

bridger commented 6 years ago

In my server log I see these errors:

[IncomingSocketManager.swift:252 removeIdleSockets(removeAll:)] epoll_ctl failure. Error code=1. Reason=Operation not permitted

It seems that the removeIdleSockets is getting the EPERM exception when cleaning up idle connections.

I'm running my server using this Docker container. It is deployed on Amazon ECS.

My server is mostly a websocket server. There is a memory leak I'm trying to investigate and I wonder if this might be related.

djones6 commented 6 years ago

I haven't seen this issue myself, though I am currently investigating a potential threading issue around removeIdleSockets (issue #237) which could be related.

There are two threads on Linux which perform epoll_wait, with connections distributed between them. These threads should be the only ones invoking epoll methods on their respective FDs, however when a new connection is received, we call removeIdleSockets (at most, once every 5 seconds) to clear any stale ones. This is performed on a different thread, and I wonder if this EPERM error is related to two threads trying to invoke functions on the same epoll FD concurrently.

bridger commented 6 years ago

That theory makes sense to me!

I just got another crash that seems related. All I got from the logs is this:

Fatal error: Trying to remove task, but it's not in the registry.: file Foundation/URLSession/TaskRegistry.swift, line 76

This has only happened once, so it is pretty rare. I don't see anything unusual in the logs beforehand.

mikezander commented 5 years ago

@bridger I'm also getting this issue, have you found a solution?

ianpartridge commented 5 years ago

@mikezander have you moved to Swift 5 recently? We've had a few reports of this and it looks like it's a bug in URLSession on Linux. There is a prototype fix here that we are hoping to get into Swift 5.0.1: https://github.com/apple/swift-corelibs-foundation/pull/2061

mikezander commented 5 years ago

@ianpartridge No I actually haven't updated to Swift 5 yet. I'm still running Swift 4 on Kitura version 2.3.0, I was thinking I should update to 2.5.0, could that possibly fix the issue?

mikezander commented 5 years ago

Hmm I can't replicate it but based off that bug it looks like the issue is Swift related.

ianpartridge commented 5 years ago

Interesting. All the reports we have had so far are on Swift 5. The problem is definitely in Foundation not Kitura so I'm afraid upgrading Kitura is unlikely to help (although we would recommend you do that anyway as there are piles of improvements since version 2.3!).

Out of interest, are you running on Swift 4.0, 4.1 or 4.2? We are discussing how long to continue to support earlier versions of Swift, and user feedback would be very helpful.

As for your immediate problem, the only option I can suggest is to avoid using URLSession on Linux :( How are you using URLSession? Directly from your Kitura app or via a library like https://github.com/IBM-Swift/SwiftyRequest ? You might consider trying https://ibm-swift.github.io/Kitura-net/Classes/ClientRequest.html instead which uses libcurl directly instead of URLSession.

gurugeek commented 4 years ago

just to report the same issue [2019-12-09T02:31:10.976+01:00] [ERROR] [IncomingSocketManager.swift:295 removeIdleSockets(removeAll:runNow:)] epoll_ctl failure. Error code=1. Reason=Operation not permitted

Swift version 5.1 (swift-5.1.2-RELEASE) Target: x86_64-unknown-linux-gnu

Kitura 2.8.0

This makes it totally unusable as a lot of requests fail (even with just 10 concurrent requests and 100 requests so not exactly high load)

ab -n 100 -c 10 https://.../index This is ApacheBench, Version 2.3 <$Revision: 1843412 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking press.toys (be patient).....done

Server Software: Apache/2.4.41 Server Hostname:
Server Port: 443 SSL/TLS Protocol: TLSv1.2,ECDHE-RSA-CHACHA20-POLY1305,2048,256 Server Temp Key: ECDH X25519 253 bits TLS Server Name:

Document Path: /index Document Length: 14023 bytes

Concurrency Level: 10 Time taken for tests: 5.114 seconds Complete requests: 100 Failed requests: 32 (Connect: 0, Receive: 0, Length: 32, Exceptions: 0) Non-2xx responses: 3 Total transferred: 1385997 bytes HTML transferred: 1371341 bytes Requests per second: 19.55 [#/sec] (mean) Time per request: 511.410 [ms] (mean) Time per request: 51.141 [ms] (mean, across all concurrent requests) Transfer rate: 264.66 [Kbytes/sec] received

Connection Times (ms) min mean[+/-sd] median max Connect: 61 92 16.7 90 156 Processing: 40 192 504.7 80 3030 Waiting: 39 185 505.7 69 3030 Total: 110 285 502.7 173 3106

Percentage of the requests served within a certain time (ms) 50% 173 66% 188 75% 223 80% 255 90% 339 95% 389 98% 3105 99% 3106 100% 3106 (longest request)

:(