apple / swift-nio

Event-driven network application framework for high performance protocol servers & clients, non-blocking.
https://swiftpackageindex.com/apple/swift-nio/documentation
Apache License 2.0
8k stars 652 forks source link

Getting message: BUG in SwiftNIO (please report), unleakable promise leaked #2201

Open eob opened 2 years ago

eob commented 2 years ago

Describe the bug

Hi folks! The Swift console asked me to report this bug:

BUG in SwiftNIO (please report), unleakable promise leaked.:474: Fatal error: leaking promise created at (file: "BUG in SwiftNIO (please report), unleakable promise leaked.", line: 474)

To Reproduce

Steps to reproduce the behavior:

I'm constructing http clients as below:


    let httpClient = HTTPClient(eventLoopGroupProvider:
      .shared(Application.sharedEventLoopGroup())
    )
    return try httpClient.execute(...).flatMapThrowing { data in
      defer {
        // Defer runs when the current scope exits.
        try? httpClient.shutdown() { error in
          Logger(label: "HTTPClient").error("Error shutting down HTTPClient: \(error)")
        }
      }
      return data
    }

The following call returns a static global Vapor application's eventLoopGroup:

Application.sharedEventLoopGroup()

That static global Vapor application is re-initialized for every unit test in a test suite, which is where this error arises.

Expected behavior

As far as I can tell, my code has no leaking promises, and the error message seems to suggest that there exists one from within the SwiftNIO library, which I find unlikely.

Screenshots If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

Additional context Add any other context about the problem here.

ktoso commented 2 years ago

Transfered to swift-nio (was originally opened as https://github.com/apple/swift/issues/59538 )

Lukasa commented 2 years ago

What SwiftNIO version are you using please?

sliemeobn commented 2 years ago

just got the same/similar error with the friendly hint to report it:

NIO-ELT-0-#0 (7): Fatal error: leaking promise created at (file: "BUG in SwiftNIO (please report), unleakable promise leaked.", line: 521)

SwiftNIO 2.44.0 Happens in deinit of EventLoopFuture

I was messing around in a test to understand what exactly syncShutdownGracefully does or doesn't do. Since this is not exactly "realistic" code I doubt it is a big issue - but here is the snippet that causes the error, which seems to be a race-condition (happens roughly every third or fourth time for me in XCode with XCTest in Debug)

func testFoo() throws {
    var hasFinished = false
    let elg = MultiThreadedEventLoopGroup(numberOfThreads: 2)

    let _ = elg.any().flatSubmit {
        let p = elg.next().makePromise(of: Bool.self)
        elg.next().scheduleTask(in: .seconds(2)) {
            p.succeed(true)
        }
        p.futureResult.whenSuccess { v in hasFinished = v }
        return p.futureResult
    }

    //let _ = try p2.wait()
    try elg.syncShutdownGracefully()
    XCTAssertTrue(hasFinished)
}
Lukasa commented 2 years ago

Yup, this is a great catch: these unleakable promises can all leak if the event loop group is shutdown under their feet. While this is an application correctness issue that can lead to horrifying bugs and we should forbid it (see #2247 and #2142), we should try to work out whether we can effectively detect and handle the leak of the promise in this situation.

weissi commented 2 years ago

In the short term, we should fix the error message and tell users that they've shut down the ELG, right? Clearly not unleakeable ;)

sliemeobn commented 2 years ago

In the short term, we should fix the error message and tell users that they've shut down the ELG, right? Clearly not unleakeable ;)

Still wrapping my head around SwiftNIO, but imho that sounds like "the fix".

Any attempt to "track" this per future feels like too much work (runtime cost) for something that just isn't realistically needed. Especially with structured concurrency around the corner...

Lukasa commented 2 years ago

Yes, this is our planned fix. The real reason we can’t do anything else is that if the EL is shutdown, we don’t have an error reporting path. The future callbacks can’t be invoked because they are guaranteed to be invoked on the EL (which is gone), so we have no way to tell the user this happened without crashing. So we should give them better guidance.

ktoso commented 2 years ago

As means of last resort to report this issue I guess you could stderr print, not much else you can do really 🤔

nashysolutions commented 1 year ago

This ticket has been open for 18 months. Is there anything I can do to help?

2023-11-07T05:47:03+0000 info codes.vapor.request : request-id=5F54BF24-FFA0-49BE-AEAA-543B0390F6C7 [Vapor] POST /users
ERROR: Cannot schedule tasks on an EventLoop that has already shut down. This will be upgraded to a forced crash in future SwiftNIO versions.
ERROR: Cannot schedule tasks on an EventLoop that has already shut down. This will be upgraded to a forced crash in future SwiftNIO versions.
BUG in SwiftNIO (please report), unleakable promise leaked.:480: Fatal error: leaking promise created at (file: "BUG in SwiftNIO (please report), unleakable promise leaked.", line: 480)
2023-11-07 05:47:03.855239+0000 xctest[10427:121272] BUG in SwiftNIO (please report), unleakable promise leaked.:480: Fatal error: leaking promise created at (file: "BUG in SwiftNIO (please report), unleakable promise leaked.", line: 480)

Screenshot 2023-11-07 at 05 52 31

nashysolutions commented 1 year ago

Interestingly, I have over 2200 threads when the crash occurs. That's a lot of threads!

Screenshot 2023-11-07 at 06 02 19

gwynne commented 1 year ago

Vapor 4.86.1 should address the extreme excess of threads, at least.

Lukasa commented 1 year ago

@nashysolutions The better approach is for your application to fix the issue where it shuts down ELGs. This issue only propose to change the wording of the error message.

gwynne commented 1 year ago

@Lukasa Yes, but I wouldn't mind seeing that wording change happen; it causes quite a bit of confusion.

nashysolutions commented 1 year ago

@nashysolutions The better approach is for your application to fix the issue where it shuts down ELGs. This issue only propose to change the wording of the error message.

I don't really know what that means. I'm seeing the issue in my unit tests. I have 3 test case subclasses with a total of 54 unit tests spread across them. My tests don't execute concurrently. The number of asynchronous events in any given unit test is just a handful at best.

If I manage to fix the issue by adding some additional tear down blocks (that call app.shutdown), should I then be confident that this app will work in production?

Lukasa commented 1 year ago

No, you should do the opposite and stop tearing down the application.

nashysolutions commented 1 year ago

No, you should do the opposite and stop tearing down the application.

I discovered the issue was due to continueAfterFailure being set to false on XCTestCase. See aforementioned bug ticket on Vapor repo (3103).

Thanks for your time @Lukasa.