SwiftPackageIndex / SwiftPackageIndex-Server

The Swift Package Index is the place to find Swift packages!
https://swiftpackageindex.com
Apache License 2.0
558 stars 46 forks source link

"Received signal 11" in app_analyze #2227

Closed finestructure closed 1 year ago

finestructure commented 1 year ago

I finally caught a glimpse as to why analysis sometimes hangs:

2023-01-21T18:57:54.937913401Z app_analyze.1.t1dix7fv4zph@p3    | [ INFO ] Analyzing (limit: 25) ... [component: analyze]
2023-01-21T18:57:54.983740764Z app_analyze.1.t1dix7fv4zph@p3    | [ INFO ] Checkout directory: /checkouts [component: analyze]
2023-01-21T18:57:54.987633060Z app_analyze.1.t1dix7fv4zph@p3    | [ INFO ] pulling https://github.com/yonaskolb/Stringly.git in /checkouts/github.com-yonaskolb-stringly [component: analyze]
2023-01-21T18:57:55.021965033Z app_analyze.1.t1dix7fv4zph@p3    | [ INFO ] pulling https://github.com/dagronf/DSFActionBar.git in /checkouts/github.com-dagronf-dsfactionbar [component: analyze]
2023-01-21T18:57:55.029915226Z app_analyze.1.t1dix7fv4zph@p3    | [ INFO ] pulling https://github.com/ordo-one/package-frostflake.git in /checkouts/github.com-ordo-one-package-frostflake [component: analyze]
2023-01-21T18:57:55.271462231Z app_analyze.1.t1dix7fv4zph@p3    | [ INFO ] pulling https://github.com/chojnac/Dumpling.git in /checkouts/github.com-chojnac-dumpling [component: analyze]
2023-01-21T18:57:55.273393230Z app_analyze.1.t1dix7fv4zph@p3    | [ INFO ] pulling https://github.com/raymccrae/swift-jsonpatch.git in /checkouts/github.com-raymccrae-swift-jsonpatch [component: analyze]
2023-01-21T18:57:55.277729726Z app_analyze.1.t1dix7fv4zph@p3    | Received signal 11. Backtrace:
2023-01-21T18:58:04.984287787Z app_analyze.1.t1dix7fv4zph@p3    | [ ERROR ] Connection request timed out. This might indicate a connection deadlock in your application. If you're running long running requests, consider increasing your connection timeout. [component: server, database-id: psql]
2023-01-21T18:58:04.984746687Z app_analyze.1.t1dix7fv4zph@p3    | [ ERROR ] Connection request timed out. This might indicate a connection deadlock in your application. If you're running long running requests, consider increasing your connection timeout. [component: server, database-id: psql]

The backtrace itself isn't in the logs but hopefully the crash is reproducible with one of the packages in question.

finestructure commented 1 year ago

None of these packages individually are causing a signal 11 when running locally (macOS/arm64). It's going to be difficult to run this on Linux/x86 but maybe Linux/arm64 can reproduce it.

finestructure commented 1 year ago

Doesn't crash on Linux/arm64 either.

I have no way to test this on Linux/x86. The next best thing we can try is to make sure the backtrace shows up in the logs.

finestructure commented 1 year ago

It's not clear to me why the backtrace isn't captured by our logging. We're logging both stderr and stdout and yet there's no trace (πŸ˜…) of the output beyond Received signal 11. Backtrace:.

finestructure commented 1 year ago

I'll close this for now, will reopen and add more details if it happens again.

finestructure commented 1 year ago

Another lock-up, different error and we got a partially symbolicated stack trace:

2023-01-27T15:37:57.557322522Z app_analyze.1.9jb9jx9vkshk@p2    | [ ERROR ] Connection request timed out. This might indicate a connection deadlock in your application. If you're running long running requests, consider increasing your con[130/1048]
meout. [component: server, database-id: psql]
2023-01-27T15:37:57.557341622Z app_analyze.1.9jb9jx9vkshk@p2    | [ ERROR ] Connection request timed out. This might indicate a connection deadlock in your application. If you're running long running requests, consider increasing your connection ti
meout. [component: server, database-id: psql]
2023-01-27T15:37:57.557377022Z app_analyze.1.9jb9jx9vkshk@p2    | [ ERROR ] Connection request timed out. This might indicate a connection deadlock in your application. If you're running long running requests, consider increasing your connection ti
meout. [component: server, database-id: psql]
2023-01-27T15:37:57.557428923Z app_analyze.1.9jb9jx9vkshk@p2    | [ ERROR ] Connection request timed out. This might indicate a connection deadlock in your application. If you're running long running requests, consider increasing your connection ti
meout. [component: server, database-id: psql]
2023-01-27T15:37:57.557445323Z app_analyze.1.9jb9jx9vkshk@p2    | [ ERROR ] Connection request timed out. This might indicate a connection deadlock in your application. If you're running long running requests, consider increasing your connection ti
meout. [component: server, database-id: psql]
2023-01-26T13:50:08.693743753Z app_analyze.1.luku8u31ozjl@p2    | 0x561f78b36187, closure vapor/console-kit#1 (Swift.Int32) -> () in static Backtrace.Backtrace.install(signals: Swift.Array<Swift.Int32>) -> () at /build/.build/checkouts/swift-backtrace/Sources/Backt
race/Backtrace.swift:80
2023-01-26T13:50:08.693750253Z app_analyze.1.luku8u31ozjl@p2    | 0x561f78b36187, @objc closure vapor/console-kit#1 (Swift.Int32) -> () in static Backtrace.Backtrace.install(signals: Swift.Array<Swift.Int32>) -> () at /build/<compiler-generated>:79
2023-01-26T13:50:08.693754653Z app_analyze.1.luku8u31ozjl@p2    | 0x7f4d69f8841f
2023-01-26T13:50:08.693758753Z app_analyze.1.luku8u31ozjl@p2    | 0x561f799f0571
2023-01-26T13:50:08.694152755Z app_analyze.1.luku8u31ozjl@p2    | 0x561f79533bd0, closure vapor/console-kit#4 () throws -> Swift.String in (extension in ShellOut):Foundation.Process.(launchBash in _839723A297212BDF262C1834C3E29C1F)(with: Swift.String, outputHandle:
 Swift.Optional<Foundation.FileHandle>, errorHandle: Swift.Optional<Foundation.FileHandle>) throws -> Swift.String at /build/.build/checkouts/ShellOut/Sources/ShellOut.swift:442
2023-01-26T13:50:08.694178555Z app_analyze.1.luku8u31ozjl@p2    | 0x561f79533bd0, partial apply forwarder for closure vapor/console-kit#4 () throws -> Swift.String in (extension in ShellOut):Foundation.Process.(launchBash in _839723A297212BDF262C1834C3E29C1F)(with:
 Swift.String, outputHandle: Swift.Optional<Foundation.FileHandle>, errorHandle: Swift.Optional<Foundation.FileHandle>) throws -> Swift.String at /build/<compiler-generated>:0
2023-01-26T13:50:08.694185355Z app_analyze.1.luku8u31ozjl@p2    | 0x561f7982a3eb
2023-01-26T13:50:08.694189055Z app_analyze.1.luku8u31ozjl@p2    | 0x561f79829285
2023-01-26T13:50:08.694192655Z app_analyze.1.luku8u31ozjl@p2    | 0x561f7982d0fb
2023-01-26T13:50:08.694196255Z app_analyze.1.luku8u31ozjl@p2    | 0x561f79824068
2023-01-26T13:50:08.694202855Z app_analyze.1.luku8u31ozjl@p2    | 0x561f79840cd9
2023-01-26T13:50:08.694206355Z app_analyze.1.luku8u31ozjl@p2    | 0x561f7982d071
2023-01-26T13:50:08.694210455Z app_analyze.1.luku8u31ozjl@p2    | 0x561f7982914b
2023-01-26T13:50:08.694214055Z app_analyze.1.luku8u31ozjl@p2    | 0x561f79829594
2023-01-26T13:50:08.694217755Z app_analyze.1.luku8u31ozjl@p2    | 0x561f795318e0, (extension in ShellOut):Foundation.Process.(launchBash in _839723A297212BDF262C1834C3E29C1F)(with: Swift.String, outputHandle: Swift.Optional<Foundation.FileHandle>,
errorHandle: Swift.Optional<Foundation.FileHandle>) throws -> Swift.String at /build/.build/checkouts/ShellOut/Sources/ShellOut.swift:441
2023-01-26T13:50:08.694237255Z app_analyze.1.luku8u31ozjl@p2    | 0x561f79531d27, ShellOut.shellOut(to: Swift.String, arguments: Swift.Array<Swift.String>, at: Swift.String, process: Foundation.Process, outputHandle: Swift.Optional<Foundation.FileH
andle>, errorHandle: Swift.Optional<Foundation.FileHandle>) throws -> Swift.String at /build/.build/checkouts/ShellOut/Sources/ShellOut.swift:40
2023-01-26T13:50:08.694242255Z app_analyze.1.luku8u31ozjl@p2    | 0x561f79531d27, ShellOut.shellOut(to: ShellOut.ShellOutCommand, at: Swift.String, process: Foundation.Process, outputHandle: Swift.Optional<Foundation.FileHandle>, errorHandle: Swift
.Optional<Foundation.FileHandle>) throws -> Swift.String at /build/.build/checkouts/ShellOut/Sources/ShellOut.swift:106
2023-01-26T13:50:08.694566856Z app_analyze.1.luku8u31ozjl@p2    | 0x561f7880f4d8, closure vapor/console-kit#1 (ShellOut.ShellOutCommand, Swift.String) throws -> Swift.String in variable initialization expression of static App.Shell.live : App.Shell at /build/Source
s/App/Core/AppEnvironment.swift:291
2023-01-26T13:50:08.694694157Z app_analyze.1.luku8u31ozjl@p2    | 0x561f78830c1d, App.Shell.run(command: ShellOut.ShellOutCommand, at: Swift.String) throws -> Swift.String at Sources/App/Core/AppEnvironment.swift:283
2023-01-26T13:50:08.694709557Z app_analyze.1.luku8u31ozjl@p2    | 0x561f78830c1d, function signature specialization <Arg[2] = Dead> of static App.Git.revisionInfo(_: App.Reference, at: Swift.String) throws -> App.Git.RevisionInfo at /build/Sources/
App/Core/Git.swift:70
2023-01-26T13:50:08.694851657Z app_analyze.1.luku8u31ozjl@p2    | 0x561f78812395, static App.Git.revisionInfo(_: App.Reference, at: Swift.String) throws -> App.Git.RevisionInfo at /build/<compiler-generated>:0
2023-01-26T13:50:08.694873258Z app_analyze.1.luku8u31ozjl@p2    | 0x561f78812395, implicit closure vapor/console-kit#12 (App.Reference, Swift.String) throws -> App.Git.RevisionInfo in implicit closure vapor/console-kit#11 (App.Git.Type) -> (App.Reference, Swift.String) throws -> Ap
p.Git.RevisionInfo in variable initialization expression of static App.Git.live : App.Git at /build/<compiler-generated>:271
2023-01-26T13:50:08.694901558Z app_analyze.1.luku8u31ozjl@p2    | 0x561f78812395, partial apply forwarder for implicit closure vapor/console-kit#12 (App.Reference, Swift.String) throws -> App.Git.RevisionInfo in implicit closure vapor/console-kit#11 (App.Git.Type) -> (App.Reference
, Swift.String) throws -> App.Git.RevisionInfo in variable initialization expression of static App.Git.live : App.Git at /build/<compiler-generated>:0
2023-01-26T13:50:08.696367663Z app_analyze.1.luku8u31ozjl@p2    | 0x561f7878eefe, closure vapor/console-kit#3 (App.Reference) throws -> App.Version in static App.Analyze.getIncomingVersions(client: Vapor.Client, logger: Logging.Logger, package: App.Joined<App.Packa
ge, App.Repository>) async throws -> Swift.Array<App.Version> at /build/Sources/App/Commands/Analyze.swift:438
2023-01-26T13:50:08.696417564Z app_analyze.1.luku8u31ozjl@p2    | 0x561f7878eefe, function signature specialization <Arg[1] = Owned To Guaranteed, Arg[2] = Owned To Guaranteed> of function signature specialization <Arg[0] = [Closure Propagated : cl
osure vapor/console-kit#3 (App.Reference) throws -> App.Version in static App.Analyze.getIncomingVersions(client: Vapor.Client, logger: Logging.Logger, package: App.Joined<App.Package, App.Repository>) async throws -> Swift.Array<App.Version>, Argument Types : [Swi
ft.StringApp.Joined<App.Package, App.Repository>]> of generic specialization <Swift.Array<App.Reference>, App.Version> of (extension in Swift):Swift.Collection.map<A>((A.Element) throws -> A1) throws -> Swift.Array<A1> at /build/<compiler-generated
>:0
2023-01-26T13:50:08.696437964Z app_analyze.1.luku8u31ozjl@p2    | 0x561f787920c7, function signature specialization <Arg[0] = [Closure Propagated : closure vapor/console-kit#3 (App.Reference) throws -> App.Version in static App.Analyze.getIncomingVersions(client: V
apor.Client, logger: Logging.Logger, package: App.Joined<App.Package, App.Repository>) async throws -> Swift.Array<App.Version>, Argument Types : [Swift.StringApp.Joined<App.Package, App.Repository>]> of generic specialization <Swift.Array<App.Refe
rence>, App.Version> of (extension in Swift):Swift.Collection.map<A>((A.Element) throws -> A1) throws -> Swift.Array<A1> at /build/<compiler-generated>:0
2023-01-26T13:50:08.696444064Z app_analyze.1.luku8u31ozjl@p2    | 0x561f787920c7, (1) suspend resume partial function for function signature specialization <Arg[3] = Dead> of static App.Analyze.getIncomingVersions(client: Vapor.Client, logger: Logg
ing.Logger, package: App.Joined<App.Package, App.Repository>) async throws -> Swift.Array<App.Version> at /build/Sources/App/Commands/Analyze.swift:437
2023-01-26T13:50:08.696486964Z app_analyze.1.luku8u31ozjl@p2    | 0x561f79e6520d
2023-01-26T13:50:08.696503764Z app_analyze.1.luku8u31ozjl@p2    | 0x561f79e658fb
2023-01-26T13:50:08.696508064Z app_analyze.1.luku8u31ozjl@p2    | 0x561f79837424
2023-01-26T13:50:08.696511764Z app_analyze.1.luku8u31ozjl@p2    | 0x561f798371d2
2023-01-26T13:50:08.696515264Z app_analyze.1.luku8u31ozjl@p2    | 0x561f79842ac1
2023-01-26T13:50:08.696518764Z app_analyze.1.luku8u31ozjl@p2    | 0x7f4d69f7c608
2023-01-26T13:50:08.696538164Z app_analyze.1.luku8u31ozjl@p2    | 0x7f4d69ac0132
2023-01-26T13:50:08.696542264Z app_analyze.1.luku8u31ozjl@p2    | 0xffffffffffffffff
2023-01-26T13:50:08.846038854Z app_analyze.1.luku8u31ozjl@p2    | /bin/bash: line 1: 20617 Illegal instruction     (core dumped) ./Run analyze --env prod --limit 25
2023-01-26T13:50:28.883543917Z app_analyze.1.luku8u31ozjl@p2    | [ INFO ] Analyzing (limit: 25) ... [component: analyze]
2023-01-26T13:50:28.953630593Z app_analyze.1.luku8u31ozjl@p2    | [ INFO ] Checkout directory: /checkouts [component: analyze]
2023-01-26T13:50:28.958182711Z app_analyze.1.luku8u31ozjl@p2    | [ INFO ] pulling https://github.com/genius/future.git in /checkouts/github.com-genius-future [component: analyze]
2023-01-26T13:50:28.987394827Z app_analyze.1.luku8u31ozjl@p2    | [ INFO ] pulling https://github.com/xiiagency/SwiftConcurrency.git in /checkouts/github.com-xiiagency-swiftconcurrency [component: analyze]
2023-01-26T13:50:29.001886784Z app_analyze.1.luku8u31ozjl@p2    | [ INFO ] pulling https://github.com/JUSTINMKAUFMAN/SwiftUIPhone.git in /checkouts/github.com-justinmkaufman-swiftuiphone [component: analyze]
2023-01-26T13:50:29.226366270Z app_analyze.1.luku8u31ozjl@p2    | [ INFO ] pulling https://github.com/patchthecode/JTAppleCalendar.git in /checkouts/github.com-patchthecode-jtapplecalendar [component: analyze]
2023-01-26T13:50:29.231274989Z app_analyze.1.luku8u31ozjl@p2    | [ INFO ] pulling https://github.com/RougeWare/Swift-Optional-Tools.git in /checkouts/github.com-rougeware-swift-optional-tools [component: analyze]
2023-01-26T13:50:29.254494181Z app_analyze.1.luku8u31ozjl@p2    | [ INFO ] pulling https://github.com/apple/swift-corelibs-xctest.git in /checkouts/github.com-apple-swift-corelibs-xctest [component: analyze]
finestructure commented 1 year ago

Common theme:

Connection request timed out. This might indicate a connection deadlock in your application.

Not sure if that's a symptom or the cause, but worth investigating.

Worth noting that these hangs started appearing when I converted more ELFs to a/a recently.

finestructure commented 1 year ago

It's been a while since we've seen this, so I'm going to close this for now.

finestructure commented 1 year ago

This is still happening and it's now also visible on dev. I believe it was always present there but we didn't see it because we didn't have an alert active there.

Recent hangs didn't show a signal 11 or a backtrace but I'm not 100% sure if that's just due to how things are logged.

What we typically get is the following in the container logs:

β”‚   [ ERROR ] Connection request timed out. This might indicate a connection deadlock in your application. If you're running long running requests, consider increasing your connection timeout. [component: server, database-id: psql]                β”‚
β”‚   [ ERROR ] Connection request timed out. This might indicate a connection deadlock in your application. If you're running long running requests, consider increasing your connection timeout. [component: server, database-id: psql]                β”‚
β”‚   [ ERROR ] Connection request timed out. This might indicate a connection deadlock in your application. If you're running long running requests, consider increasing your connection timeout. [component: server, database-id: psql]                β”‚
β”‚   [ ERROR ] Connection request timed out. This might indicate a connection deadlock in your application. If you're running long running requests, consider increasing your connection timeout. [component: server, database-id: psql]                β”‚
β”‚   [ ERROR ] Connection request timed out. This might indicate a connection deadlock in your application. If you're running long running requests, consider increasing your connection timeout. [component: server, database-id: psql]                β”‚
β”‚   [ ERROR ] Connection request timed out. This might indicate a connection deadlock in your application. If you're running long running requests, consider increasing your connection timeout. [component: server, database-id: psql]                β”‚
β”‚   [ ERROR ] Connection request timed out. This might indicate a connection deadlock in your application. If you're running long running requests, consider increasing your connection timeout. [component: server, database-id: psql]                β”‚
β”‚   [ ERROR ] Connection request timed out. This might indicate a connection deadlock in your application. If you're running long running requests, consider increasing your connection timeout. [component: server, database-id: psql]                β”‚
β”‚   [ ERROR ] Connection request timed out. This might indicate a connection deadlock in your application. If you're running long running requests, consider increasing your connection timeout. [component: server, database-id: psql]                β”‚
β”‚   [ ERROR ] Connection request timed out. This might indicate a connection deadlock in your application. If you're running long running requests, consider increasing your connection timeout. [component: server, database-id: psql]                β”‚
β”‚   [ ERROR ] Connection request timed out. This might indicate a connection deadlock in your application. If you're running long running requests, consider increasing your connection timeout. [component: server, database-id: psql]                β”‚
β”‚   [ ERROR ] Connection request timed out. This might indicate a connection deadlock in your application. If you're running long running requests, consider increasing your connection timeout. [component: server, database-id: psql]                β”‚
β”‚   [ ERROR ] Connection request timed out. This might indicate a connection deadlock in your application. If you're running long running requests, consider increasing your connection timeout. [component: server, database-id: psql]                β”‚
β”‚   [ ERROR ] Connection request timed out. This might indicate a connection deadlock in your application. If you're running long running requests, consider increasing your connection timeout. [component: server, database-id: psql]                β”‚
β”‚   [ ERROR ] Connection request timed out. This might indicate a connection deadlock in your application. If you're running long running requests, consider increasing your connection timeout. [component: server, database-id: psql]                β”‚
β”‚   [ ERROR ] Connection request timed out. This might indicate a connection deadlock in your application. If you're running long running requests, consider increasing your connection timeout. [component: server, database-id: psql]                β”‚
β”‚   [ ERROR ] Connection request timed out. This might indicate a connection deadlock in your application. If you're running long running requests, consider increasing your connection timeout. [component: server, database-id: psql]                β”‚
β”‚   [ ERROR ] Connection request timed out. This might indicate a connection deadlock in your application. If you're running long running requests, consider increasing your connection timeout. [component: server, database-id: psql]                β”‚
β”‚   [ ERROR ] Connection request timed out. This might indicate a connection deadlock in your application. If you're running long running requests, consider increasing your connection timeout. [component: server, database-id: psql]                β”‚
β”‚   [ ERROR ] Connection request timed out. This might indicate a connection deadlock in your application. If you're running long running requests, consider increasing your connection timeout. [component: server, database-id: psql]                β”‚

and something like this in the Grafana logs:

2023-04-17 19:38:18 
[ INFO ] pulling https://github.com/gonzalezreal/AdaptiveCardUI.git in /checkouts/github.com-gonzalezreal-adaptivecardui [component: analyze]
2023-04-17 19:38:18 
[ INFO ] pulling https://github.com/EFPrefix/EFStorage.git in /checkouts/github.com-efprefix-efstorage [component: analyze]
2023-04-17 19:38:17 
[ INFO ] pulling https://github.com/hendriku/colorpicker.git in /checkouts/github.com-hendriku-colorpicker [component: analyze]
2023-04-17 19:38:17 
[ INFO ] pulling https://github.com/LiYanan2004/SwiftUI-Haptics.git in /checkouts/github.com-liyanan2004-swiftui-haptics [component: analyze]
2023-04-17 19:38:17 
[ INFO ] pulling https://github.com/0xLet/WTV.git in /checkouts/github.com-0xlet-wtv [component: analyze]
2023-04-17 19:38:17 
[ INFO ] Checkout directory: /checkouts [component: analyze]
2023-04-17 19:38:17 
[ INFO ] Analyzing (limit: 25) ... [component: analyze]

I'll post something in the Swift slack to see if there's some kind of logging we can add to track this down.

I'm pretty sure this is related to our move to a/a. That's when it started happening.

finestructure commented 1 year ago

This here seems to be exactly the issue we're running into: https://github.com/vapor/apns/issues/28

It also explains why we're only seeing this in analyze: of the four batch processes it's the one with the heaviest use of network requests (querying Github's API).

finestructure commented 1 year ago

After increasing log level to DEBUG we're seeming interesting details:

β”‚   [ DEBUG ] Connection pool exhausted on this event loop, adding request to waitlist [component: server, database-id: psql] (AsyncKit/EventLoopConnectionPool.swift:206)                                                                             β”‚
β”‚   [ DEBUG ] Fulfilling connection waitlist request [component: server, database-id: psql] (AsyncKit/EventLoopConnectionPool.swift:275)                                                                                                               β”‚
β”‚   [ DEBUG ] Fulfilling connection waitlist request [component: server, database-id: psql] (AsyncKit/EventLoopConnectionPool.swift:275)                                                                                                               β”‚
β”‚   [ DEBUG ] Connection pool exhausted on this event loop, adding request to waitlist [component: server, database-id: psql] (AsyncKit/EventLoopConnectionPool.swift:206)                                                                             β”‚
β”‚   [ DEBUG ] Fulfilling connection waitlist request [component: server, database-id: psql] (AsyncKit/EventLoopConnectionPool.swift:275)                                                                                                               β”‚
β”‚   [ DEBUG ] Fulfilling connection waitlist request [component: server, database-id: psql] (AsyncKit/EventLoopConnectionPool.swift:275)                                                                                                               β”‚
β”‚   [ DEBUG ] Fulfilling connection waitlist request [component: server, database-id: psql] (AsyncKit/EventLoopConnectionPool.swift:275)                                                                                                               β”‚
β”‚   [ DEBUG ] Fulfilling connection waitlist request [component: server, database-id: psql] (AsyncKit/EventLoopConnectionPool.swift:275)                                                                                                               β”‚
β”‚   [ DEBUG ] Fulfilling connection waitlist request [component: server, database-id: psql] (AsyncKit/EventLoopConnectionPool.swift:275)                                                                                                               β”‚
β”‚   [ DEBUG ] Fulfilling connection waitlist request [component: server, database-id: psql] (AsyncKit/EventLoopConnectionPool.swift:275)                                                                                                               β”‚
β”‚   [ DEBUG ] Fulfilling connection waitlist request [component: server, database-id: psql] (AsyncKit/EventLoopConnectionPool.swift:275)                                                                                                               β”‚
β”‚   [ DEBUG ] Fulfilling connection waitlist request [component: server, database-id: psql] (AsyncKit/EventLoopConnectionPool.swift:275)                                                                                                               β”‚
β”‚   [ DEBUG ] Fulfilling connection waitlist request [component: server, database-id: psql] (AsyncKit/EventLoopConnectionPool.swift:275)                                                                                                               β”‚
β”‚   [ DEBUG ] Fulfilling connection waitlist request [component: server, database-id: psql] (AsyncKit/EventLoopConnectionPool.swift:275)                                                                                                               β”‚
β”‚   [ DEBUG ] Fulfilling connection waitlist request [component: server, database-id: psql] (AsyncKit/EventLoopConnectionPool.swift:275)                                                                                                               β”‚
β”‚   [ DEBUG ] Fulfilling connection waitlist request [component: server, database-id: psql] (AsyncKit/EventLoopConnectionPool.swift:275)                                                                                                               β”‚
β”‚   [ DEBUG ] Fulfilling connection waitlist request [component: server, database-id: psql] (AsyncKit/EventLoopConnectionPool.swift:275)                                                                                                               β”‚
β”‚   [ DEBUG ] Fulfilling connection waitlist request [component: server, database-id: psql] (AsyncKit/EventLoopConnectionPool.swift:275)                                                                                                               β”‚
β”‚   [ DEBUG ] updateStatus ops: 25 [component: analyze] (App/Common.swift:48)                                                                                                                                                                          β”‚
β”‚   [ DEBUG ] REFRESH MATERIALIZED VIEW recent_packages [] [component: server, database-id: psql] (FluentPostgresDriver/FluentPostgresDatabase.swift:164)                                                                                              β”‚

The Connection pool exhausted on this event code path is one step from our deadlock error message:

https://github.com/vapor/async-kit/blob/a61da00d404ec91d12766f1b9aac7d90777b484d/Sources/AsyncKit/ConnectionPool/EventLoopConnectionPool.swift#L206

        // all connections are busy, check if we have room for more
        if self.activeConnections < self.maxConnections {
            logger.debug("No available connections on this event loop, creating a new one")
            self.activeConnections += 1
            return makeActiveConnection()
        } else {
            // connections are exhausted, we must wait for one to be returned
            logger.debug("Connection pool exhausted on this event loop, adding request to waitlist")
            let promise = eventLoop.makePromise(of: Source.Connection.self)
            self.waiters.append((logger, promise))

            let task = eventLoop.scheduleTask(in: self.requestTimeout) { [weak self] in
                guard let self = self else { return }
                logger.error("Connection request timed out. This might indicate a connection deadlock in your application. If you're running long running requests, consider increasing your connection timeout.")
                if let idx = self.waiters.firstIndex(where: { _, p in return p.futureResult === promise.futureResult }) {
                    self.waiters.remove(at: idx)
                }
                promise.fail(ConnectionPoolTimeoutError.connectionRequestTimeout)
            }

            return promise.futureResult.always { _ in task.cancel() }
        }
    }
finestructure commented 1 year ago

After reducing the batch size to 5 (from 25) the messages are still there but there are fewer of them:

β”‚   id: psql] (FluentPostgresDriver/FluentPostgresDatabase.swift:33)                                                                                                                                                                                   β”‚
β”‚   [ DEBUG ] UPDATE "packages" SET "status" = 'ok', "score" = $1, "processing_stage" = 'analysis', "updated_at" = $2 WHERE "packages"."id" = $3 [60, 2023-04-18 09:52:42 +0000, E11AC48B-A2C8-48FB-B496-D05FE9368F93] [component: server, database-   β”‚
β”‚   id: psql] (FluentPostgresDriver/FluentPostgresDatabase.swift:33)                                                                                                                                                                                   β”‚
β”‚   [ DEBUG ] Connection pool exhausted on this event loop, adding request to waitlist [component: server, database-id: psql] (AsyncKit/EventLoopConnectionPool.swift:206)                                                                             β”‚
β”‚   [ DEBUG ] Connection pool exhausted on this event loop, adding request to waitlist [component: server, database-id: psql] (AsyncKit/EventLoopConnectionPool.swift:206)                                                                             β”‚
β”‚   [ DEBUG ] Fulfilling connection waitlist request [component: server, database-id: psql] (AsyncKit/EventLoopConnectionPool.swift:275)                                                                                                               β”‚
β”‚   [ DEBUG ] Fulfilling connection waitlist request [component: server, database-id: psql] (AsyncKit/EventLoopConnectionPool.swift:275)                                                                                                               β”‚
β”‚   [ DEBUG ] updateStatus ops: 5 [component: analyze] (App/Common.swift:48)                                                                                                                                                                           β”‚
β”‚   [ DEBUG ] REFRESH MATERIALIZED VIEW recent_packages [] [component: server, database-id: psql] (FluentPostgresDriver/FluentPostgresDatabase.swift:164)                                                                                              β”‚

This could possible prevent us from running into the deadlock but I'll also check if we can get rid of them entirely by increasing the pool size.

finestructure commented 1 year ago

One notable observation: the last couple of times this happened I didn't see any backtrace. I'm not sure if this means the process isn't crashing.

Note also that in January '23 when I opened this issue we were running on Swift 5.7 and now we're on 5.8, so that might be an explanation.

finestructure commented 1 year ago

Happened again yesterday (alert from April 26 2023, 13:42 CET) but no additional detail in logs:

CleanShot 2023-04-27 at 11 30 17@2x CleanShot 2023-04-27 at 11 28 52@2x

I didn't get a chance to check the state of the container before a re-deploy "fixed" the issue, so I can't be 100% sure it was the same error.

finestructure commented 1 year ago

hang-prod-2023-04-27.logs.zip

finestructure commented 1 year ago

Happened again on dev just now.

finestructure commented 1 year ago

Happened again on dev this morning.

finestructure commented 1 year ago

Happened in prod this morning.

finestructure commented 1 year ago

And now on dev.

finestructure commented 1 year ago

The analysis job is also running git commands. Johannes Weiß mentioned in passing on the Swift Open Source slack that they saw "crazy stuff" due to FileManager.defaul.removeItems(path) taking very long due to a large directory being remove. I wonder if our case is a symptom of that same issue.

EventLoop latency logging is a recommended diagnostic tool, more details here: https://github.com/apple/swift-nio/issues/2410

finestructure commented 1 year ago

I mean, we do run checkouts of new packages for instance, and those can take > 10s if they're large.

finestructure commented 1 year ago

Hang on dev

finestructure commented 1 year ago

Hang on prod

finestructure commented 1 year ago

Hang on prod

finestructure commented 1 year ago

Ok, I've actually managed to reproduce this locally on my slow network:

CleanShot 2023-05-08 at 10 58 29@2x
finestructure commented 1 year ago

And it's back, but in slightly different form. It always bothered me that initially I saw Received signal 11 crashes in the logs when the hangs happened but then it turned into the connection deadlock errors.

Turns out, the connection deadlock errors just masked the crashes. Now that they're gone, we see them, although it's signal 4, at least this time around:

Received signal 4. Backtrace:
0x559b0531a222, Backtrace.(printBacktrace in _B82A8C0ED7C904841114FDF244F9E58E)(signal: Swift.Int32) -> () at /build/.build/checkouts/swift-backtrace/Sources/Backtrace/Backtrace.swift:66
0x559b0531a497, closure vapor/console-kit#1 (Swift.Int32) -> () in static Backtrace.Backtrace.install(signals: Swift.Array<Swift.Int32>) -> () at /build/.build/checkouts/swift-backtrace/Sources/Backtrace/Backtrace.swift:80
0x559b0531a497, @objc closure vapor/console-kit#1 (Swift.Int32) -> () in static Backtrace.Backtrace.install(signals: Swift.Array<Swift.Int32>) -> () at /build/<compiler-generated>:79
0x7f7a21f5b41f
0x559b06336951
0x559b05d58b69, closure vapor/console-kit#4 () throws -> Swift.String in (extension in ShellOut):Foundation.Process.(launchBash in _839723A297212BDF262C1834C3E29C1F)(with: Swift.String, outputHandle: Swift.Optional<Foundation.FileHandle>, errorHandle: Swift.Optional<Foundation.FileHandle>, environment: Swift.Optional<Swift.Dictionary<Swift.String, Swift.String>>) throws -> Swift.String at /build/.build/checkouts/ShellOut/Sources/ShellOut.swift:476
0x559b05d59981, partial apply forwarder for closure vapor/console-kit#4 () throws -> Swift.String in (extension in ShellOut):Foundation.Process.(launchBash in _839723A297212BDF262C1834C3E29C1F)(with: Swift.String, outputHandle: Swift.Optional<Foundation.FileHandle>, errorHandle: Swift.Optional<Foundation.FileHandle>, environment: Swift.Optional<Swift.Dictionary<Swift.String, Swift.String>>) throws -> Swift.String at /build/<compiler-generated>:0
0x559b0614e43b
0x559b0614d265
0x559b061515bb
0x559b06147fe8
0x559b06165269
0x559b06151041
0x559b0614d12b
0x559b0614d574
0x559b05d55a9e, (extension in ShellOut):Foundation.Process.(launchBash in _839723A297212BDF262C1834C3E29C1F)(with: Swift.String, outputHandle: Swift.Optional<Foundation.FileHandle>, errorHandle: Swift.Optional<Foundation.FileHandle>, environment: Swift.Optional<Swift.Dictionary<Swift.String, Swift.String>>) throws -> Swift.String at /build/.build/checkouts/ShellOut/Sources/ShellOut.swift:475
0x559b05d55d12, ShellOut.shellOut(to: ShellOut.SafeString, arguments: Swift.Array<ShellOut.Argument>, at: Swift.String, process: Foundation.Process, outputHandle: Swift.Optional<Foundation.FileHandle>, errorHandle: Swift.Optional<Foundation.FileHandle>, environment: Swift.Optional<Swift.Dictionary<Swift.String, Swift.String>>) throws -> Swift.String at /build/.build/checkouts/ShellOut/Sources/ShellOut.swift:42
0x559b05d55d12, ShellOut.shellOut(to: ShellOut.ShellOutCommand, at: Swift.String, process: Foundation.Process, outputHandle: Swift.Optional<Foundation.FileHandle>, errorHandle: Swift.Optional<Foundation.FileHandle>, environment: Swift.Optional<Swift.Dictionary<Swift.String, Swift.String>>) throws -> Swift.String at /build/.build/checkouts/ShellOut/Sources/ShellOut.swift:76
0x559b04fafe87, closure vapor/console-kit#1 (ShellOut.ShellOutCommand, Swift.String) throws -> Swift.String in variable initialization expression of static App.Shell.live : App.Shell at /build/Sources/App/Core/AppEnvironment.swift:282
0x559b04fdbe54, App.Shell.run(command: ShellOut.ShellOutCommand, at: Swift.String) throws -> Swift.String at Sources/App/Core/AppEnvironment.swift:274
0x559b04fdbe54, function signature specialization <Arg[2] = Dead> of static App.Git.revisionInfo(_: App.Reference, at: Swift.String) throws -> App.Git.RevisionInfo at /build/Sources/App/Core/Git.swift:70
0x559b04fafdf5, static App.Git.revisionInfo(_: App.Reference, at: Swift.String) throws -> App.Git.RevisionInfo at /build/Sources/App/Core/AppEnvironment.swift:0
0x559b04fafdf5, implicit closure vapor/console-kit#6 (App.Reference, Swift.String) throws -> App.Git.RevisionInfo in variable initialization expression of static App.Git.live : App.Git at /build/Sources/App/Core/AppEnvironment.swift:262
0x559b04f39e63, closure vapor/console-kit#3 (App.Reference) throws -> App.Version in static App.Analyze.getIncomingVersions(client: Vapor.Client, logger: Logging.Logger, package: App.Joined<App.Package, App.Repository>) async throws -> Swift.Array<App.Version> at /build/Sources/App/Commands/Analyze.swift:421
0x559b04f39e63, function signature specialization <Arg[1] = Owned To Guaranteed, Arg[2] = Owned To Guaranteed> of function signature specialization <Arg[0] = [Closure Propagated : closure vapor/console-kit#3 (App.Reference) throws -> App.Version in static App.Analyze.getIncomingVersions(client: Vapor.Client, logger: Logging.Logger, package: App.Joined<App.Package, App.Repository>) async throws -> Swift.Array<App.Version>, Argument Types : [Swift.StringApp.Joined<App.Package, App.Repository>]> of generic specialization <Swift.Array<App.Reference>, App.Version> of (extension in Swift):Swift.Collection.map<A>((A.Element) throws -> A1) throws -> Swift.Array<A1> at /build/<compiler-generated>:0
0x559b04f3d28f, function signature specialization <Arg[0] = [Closure Propagated : closure vapor/console-kit#3 (App.Reference) throws -> App.Version in static App.Analyze.getIncomingVersions(client: Vapor.Client, logger: Logging.Logger, package: App.Joined<App.Package, App.Repository>) async throws -> Swift.Array<App.Version>, Argument Types : [Swift.StringApp.Joined<App.Package, App.Repository>]> of generic specialization <Swift.Array<App.Reference>, App.Version> of (extension in Swift):Swift.Collection.map<A>((A.Element) throws -> A1) throws -> Swift.Array<A1> at /build/<compiler-generated>:0
0x559b04f3d28f, (1) suspend resume partial function for function signature specialization <Arg[3] = Dead> of static App.Analyze.getIncomingVersions(client: Vapor.Client, logger: Logging.Logger, package: App.Joined<App.Package, App.Repository>) async throws -> Swift.Array<App.Version> at /build/Sources/App/Commands/Analyze.swift:420
0x559b06788d9d
0x559b0678947b
0x559b0615b7e3
0x559b0615b591
0x559b06167071
0x7f7a21f4f608
0x7f7a21a92132
0xffffffffffffffff
/bin/bash: line 1: 21985 Illegal instruction     (core dumped) ./Run analyze --env dev --limit 25

There are lots of occurrences in this log grab and I failed to add timestamp when dumping them, so it's hard to tell if they're all closely related in time.

The last one has a ShellOut error happening just before:

[ INFO ] fetch failed: Shell command failed:
command: "git fetch --tags --prune-tags --prune"
path:    "/checkouts/github.com-firebase-firebase-ios-sdk"
message: "ShellOut encountered an error
Status code: 1
Message: "From https://github.com/firebase/firebase-ios-sdk
 - [deleted]             (none)     -> origin/nc/csh-analyzer-fixes
 - [deleted]             (none)     -> origin/pb-apnstokentype
   cead23010..927ae355b  master                    -> origin/master
 * [new branch]          abt-exp-meta              -> origin/abt-exp-meta
   80357b4e3..01cbad1f0  appcheck-lite             -> origin/appcheck-lite
   35d2acbc4..d3eb5c805  auth-swift                -> origin/auth-swift
   0f47aa608..11ddd49c8  markduckworth/sum-avg-cpp-api -> origin/markduckworth/sum-avg-cpp-api
   fcfb8f5a9..e18205268  samedson/wowlocal-urgent-deadlock -> origin/samedson/wowlocal-urgent-deadlock
 + 46a10002b...7105602ef themis/swiftDemangle      -> origin/themis/swiftDemangle  (forced update)
 ! [rejected]            CocoaPods-10.10.0.nightly -> CocoaPods-10.10.0.nightly  (would clobber existing tag)"
Output: """ [component: analyze]
[ INFO ] removing directory [component: analyze]
[ INFO ] cloning https://github.com/firebase/firebase-ios-sdk.git to /checkouts/github.com-firebase-firebase-ios-sdk [component: analyze]
Received signal 4. Backtrace:

That may be significant. I.e. it might be that the earlier signal 4 errors are being recovered - actually I think that's in fact the case. There are sections in the logs where processing is normal:

[ INFO ] Analyzing (limit: 25) ... [component: analyze]
[ INFO ] Checkout directory: /checkouts [component: analyze]
[ INFO ] pulling https://github.com/theappstudiollc/ASEnterprise.git in /checkouts/github.com-theappstudiollc-asenterprise [component: analyze]
[ INFO ] pulling https://github.com/cxa/Piz.git in /checkouts/github.com-cxa-piz [component: analyze]
[ INFO ] pulling https://github.com/BeAppOnline/AFBilling.git in /checkouts/github.com-beapponline-afbilling [component: analyze]
[ INFO ] pulling https://github.com/heckj/MeshGenerator.git in /checkouts/github.com-heckj-meshgenerator [component: analyze]
[ INFO ] pulling https://github.com/SocketMobile/swift-package-capturesdk.git in /checkouts/github.com-socketmobile-swift-package-capturesdk [component: analyze]
[ INFO ] pulling https://github.com/chrispomeroyhale/csv-dialect-swift.git in /checkouts/github.com-chrispomeroyhale-csv-dialect-swift [component: analyze]
[ INFO ] pulling https://github.com/dagronf/DSFActionBar.git in /checkouts/github.com-dagronf-dsfactionbar [component: analyze]
[ INFO ] pulling https://github.com/swift-kipple/Core.git in /checkouts/github.com-swift-kipple-core [component: analyze]
[ INFO ] pulling https://github.com/apocolipse/Thrift-Swift.git in /checkouts/github.com-apocolipse-thrift-swift [component: analyze]
[ INFO ] pulling https://github.com/ParkGwangBeom/TransitionableTab.git in /checkouts/github.com-parkgwangbeom-transitionabletab [component: analyze]
[ INFO ] pulling https://github.com/apple/swift-foundation-icu.git in /checkouts/github.com-apple-swift-foundation-icu [component: analyze]
[ INFO ] pulling https://github.com/StanfordBDHG/CardinalKitFHIRtoFirestoreAdapter.git in /checkouts/github.com-stanfordbdhg-cardinalkitfhirtofirestoreadapter [component: analyze]
[ INFO ] pulling https://github.com/dmrschmidt/QRCode.git in /checkouts/github.com-dmrschmidt-qrcode [component: analyze]
[ INFO ] pulling https://github.com/driven-app/porsche-connect.git in /checkouts/github.com-driven-app-porsche-connect [component: analyze]
[ INFO ] pulling https://github.com/timsearle/euclid.git in /checkouts/github.com-timsearle-euclid [component: analyze]
[ INFO ] pulling https://github.com/danielsaidi/Sheeeeeeeeet.git in /checkouts/github.com-danielsaidi-sheeeeeeeeet [component: analyze]
[ INFO ] pulling https://github.com/Carthage/ReactiveTask.git in /checkouts/github.com-carthage-reactivetask [component: analyze]
[ INFO ] pulling https://github.com/tonystone/tracelog.git in /checkouts/github.com-tonystone-tracelog [component: analyze]
[ INFO ] pulling https://github.com/amosavian/FileProvider.git in /checkouts/github.com-amosavian-fileprovider [component: analyze]
[ INFO ] pulling https://github.com/theleftbit/BSWInterfaceKit.git in /checkouts/github.com-theleftbit-bswinterfacekit [component: analyze]
[ INFO ] pulling https://github.com/m-barthelemy/DockerSwift.git in /checkouts/github.com-m-barthelemy-dockerswift [component: analyze]
[ INFO ] pulling https://github.com/SwiftyFinch/Rugby.git in /checkouts/github.com-swiftyfinch-rugby [component: analyze]
[ INFO ] pulling https://github.com/mapbox/mapbox-directions-swift.git in /checkouts/github.com-mapbox-mapbox-directions-swift [component: analyze]
[ INFO ] pulling https://github.com/Kitura/Kitura.git in /checkouts/github.com-kitura-kitura [component: analyze]
[ INFO ] pulling https://github.com/fxm90/GradientLoadingBar.git in /checkouts/github.com-fxm90-gradientloadingbar [component: analyze]
[ INFO ] Analyzing (limit: 25) ... [component: analyze]
[ INFO ] Checkout directory: /checkouts [component: analyze]
[ INFO ] Analyzing (limit: 25) ... [component: analyze]
[ INFO ] Checkout directory: /checkouts [component: analyze]

This is after an earlier signal 4 error. So that's not what's making it hang. I take that as an indication that the git fetch --tags --prune-tags --prune error might be close to the root cause.

I grabbed the repo in question (github.com-firebase-firebase-ios-sdk) but I did this after restarting the service and it might have been too late. In fact, running git fetch --tags --prune-tags --prune on a copy does not show that same error.

Full logs: hang-dev-2023-05-10.logs.zip

finestructure commented 1 year ago

This is happening with release 2.83.13, for backtrace line number purposes.

finestructure commented 1 year ago

We're actually also still seeing the connection deadlock, despite the attempted fix deployed yesterday. It could be that it's in another part though. I.e. while I moved the git clone/pull out of the db transaction, the 10s timeout could be exceeded by some other synchronous operation.

The logs from prod also show that there are Received signal 4 crashes, and it's clear that they don't hang the process.

hang-prod-2023-05-10.log.zip

finestructure commented 1 year ago

The signal 4 crashes might be related to this:

https://askubuntu.com/questions/1420966/method-https-has-died-unexpectedly-sub-process-https-received-signal-4-after

I'll attempt a fix to try and remove that part from the equation.

finestructure commented 1 year ago

Turns out this is somewhat tricky for uninteresting reasons. What gives me pause is that the nature of the env variable (and the bug reports) indicate this is an issue when running network requests. However, we see this fail on git operations like

git rev-list --count HEAD

or

git log -n1 --format=format:"%H-%ct" foo

which should run locally. Might limit it to just changing these two commands for now to see if it has an effect before applying it to all git commands we run.

finestructure commented 1 year ago

Late note: it doesn't seem to have anything to do with the git commands. The one git command I'd instrumented still appeared in the crashes, it did nothing: https://github.com/SwiftPackageIndex/SwiftPackageIndex-Server/pull/2381

finestructure commented 1 year ago

Increasing the connection pool timeout in #2386 seems have fixed the hangs - we haven't had any on either envs since then (a week ago). I don't think the underlying issue is addressed (as detailed here: https://github.com/vapor/async-kit/issues/104) but at least it doesn't impact our processing any more, at least for the moment.

Closing this for now.

finestructure commented 1 year ago

Just happened again on prod. The timeout increase just made it a lot less frequent (as expected).

finestructure commented 1 year ago

For the record, it hasn't happened since May 26 🀞(it isn't fixed but it seems to be rare enough not to be an issue for the moment).

finestructure commented 1 year ago

Happened again this morning on both envs within a few hours.

finestructure commented 1 year ago

Happened again at ~18:00 CET, July 3 2023.

CleanShot 2023-07-03 at 22 31 30@2x

finestructure commented 1 year ago

Happened again at ~5:30 CET, July 11 2023.

finestructure commented 1 year ago

prod, ~12:00 CET, July 22 2023

finestructure commented 1 year ago

prod, ~5:40 CET, July 30 2023

finestructure commented 1 year ago

prod, ~21:00 CET, July 31 2023

finestructure commented 1 year ago

prod, ~1:30 CET, Aug 12 2023

finestructure commented 1 year ago

prod, 9:19 CET, Sep 3 2023

finestructure commented 1 year ago

We've had a hang again, with all the latest ShellOut fixes, at 16:18 CET today, Sep 6 2023.

It's very clear now that the cause is an actual crash:

CleanShot 2023-09-06 at 21 17 56@2x

2023-09-06T13:36:58.114865247Z Received signal 11. Backtrace:
2023-09-06T13:36:58.615927201Z 0x55ba0c45b4f1, Backtrace.(printBacktrace in _B82A8C0ED7C904841114FDF244F9E58E)(signal: Swift.Int32) -> () at /build/.build/checkouts/swift-backtrace/Sources/Backtrace/Backtrace.swift:66
2023-09-06T13:36:58.615971303Z 0x55ba0c45b7a7, closure #1 (Swift.Int32) -> () in static Backtrace.Backtrace.install(signals: Swift.Array<Swift.Int32>) -> () at /build/.build/checkouts/swift-backtrace/Sources/Backtrace/Backtrace.swift:80
2023-09-06T13:36:58.615994404Z 0x55ba0c45b7a7, @objc closure #1 (Swift.Int32) -> () in static Backtrace.Backtrace.install(signals: Swift.Array<Swift.Int32>) -> () at /build/<compiler-generated>:79
2023-09-06T13:36:58.615999404Z 0x7fb0bf00941f
2023-09-06T13:36:58.616003205Z 0x55ba0df75896
2023-09-06T13:36:58.616006905Z 0x55ba0df7ceb9
2023-09-06T13:36:58.616011105Z 0x55ba0df7ccb5
2023-09-06T13:36:58.616014905Z 0x55ba0df6c60b
2023-09-06T13:36:58.616018605Z 0x55ba0df7575b
2023-09-06T13:36:58.616022306Z 0x55ba0df6941d
2023-09-06T13:36:58.616470727Z 0x55ba0c0fa23a, implicit closure #4 (Swift.String) throws -> Swift.Array<Swift.String> in implicit closure #3 (Foundation.FileManager) -> (Swift.String) throws -> Swift.Array<Swift.String> in variable initialization expression of static App.FileManager.live : App.FileManager at /build/Sources/App/Core/AppEnvironment.swift:270
2023-09-06T13:36:58.616499028Z 0x55ba0c0fa23a, partial apply forwarder for implicit closure #4 (Swift.String) throws -> Swift.Array<Swift.String> in implicit closure #3 (Foundation.FileManager) -> (Swift.String) throws -> Swift.Array<Swift.String> in variable initialization expression of static App.FileManager.live : App.FileManager at /build/<compiler-generated>:0
2023-09-06T13:36:58.620597122Z 0x55ba0c0404fe, App.FileManager.contentsOfDirectory(atPath: Swift.String) throws -> Swift.Array<Swift.String> at Sources/App/Core/AppEnvironment.swift:258
2023-09-06T13:36:58.620625524Z 0x55ba0c0404fe, function signature specialization <Arg[0] = Dead> of static App.Analyze.trimCheckouts() throws -> () at /build/Sources/App/Commands/Analyze.swift:97
2023-09-06T13:36:58.620662926Z 0x55ba0c037a2c, static App.Analyze.trimCheckouts() throws -> () at /build/<compiler-generated>:0
2023-09-06T13:36:58.620772931Z 0x55ba0c037a2c, (3) suspend resume partial function for App.Analyze.Command.run(using: ConsoleKit.CommandContext, signature: App.Analyze.Command.Signature) async throws -> () at /build/Sources/App/Commands/Analyze.swift:64
2023-09-06T13:36:58.620926738Z 0x55ba0de0123e
2023-09-06T13:36:58.620943239Z 0x55ba0de0191f
2023-09-06T13:36:58.620947939Z 0x55ba0de162f8
2023-09-06T13:36:58.620951539Z 0x55ba0de1610e
2023-09-06T13:36:58.620955639Z 0x55ba0de21135
2023-09-06T13:36:58.620959240Z 0x7fb0beffd608
2023-09-06T13:36:58.621011242Z 0x7fb0becfc132
2023-09-06T13:36:58.621028243Z 0xffffffffffffffff
2023-09-06T13:36:58.763773210Z /bin/bash: line 1: 28473 Segmentation fault      (core dumped) ./Run analyze --env prod --limit 25
2023-09-06T14:02:12.447331859Z Received signal 11. Backtrace:

Note the timestamp of the last entry at 14:02 UTC (16:00 CET) is a single entry and then it hangs. Our alerts trigger when we've not seen logs for 20 mins, so that aligns with the alert at 16:18 CET.

I see signal 11 crashes every few hours but it seems they don't always lead to a hang. Haven't looked at the logs in more detail yet, attached below.

analyze-2023-09-06-stderr.logs.zip

analyze-2023-09-06.logs.zip

FYI @gwynne 😒

gwynne commented 1 year ago

@finestructure So, from that crash log, looks to me like odds are decent you're hitting the ARC miscompile crash in 5.8.1. My very unorthodox suggestion is to try building with a 5.9 snapshot, for three reasons:

  1. 5.9 is now (presumably) very close to release, and so can be reasonably considered at least hopefully stable. I've seen several people using it in production (at least for smaller stuff) already, and we're doing so in Vapor for generating DocC docs.
  2. 5.9 is known to have had the fix for the miscompile in question for at least 3 months now, whereas it's pretty much a foregone conclusion that it won't be backported to 5.8 at this point.
  3. 5.9 now has built-in Backtrace support, and Vapor has been updated accordingly, meaning if it does crash again, it's likely the log will be more helpful.
finestructure commented 1 year ago

Mmm, we're actually compiling with a 5.9 toolchain, from Sep 1 or so.

https://gitlab.com/finestructure/spi-base/-/blob/main/Dockerfile

gwynne commented 1 year ago

Ooof... then my suggestion is to update your dependencies (and bump your 5.9 to the latest snapshot (which at the moment appears to be sha256:76911e1da04bce21683872c23033f6e2c07f296df7465fce0bf1e1bd01835001). This will get you the new native backtracking support (if you don't already have it), plus the updated Vapor that doesn't overwrite it with the old version.

finestructure commented 1 year ago

That image is from yesterday, dependencies updated on Monday - we should be pretty much up to date, I'd say.

I'm not sure why the backtrace is only partially working. It's been like this for a long time. It's weird πŸ€”

finestructure commented 1 year ago

(I'll def check in the morning that we've got the Vapor update you're referring to.)

gwynne commented 1 year ago

For reference, the update in question is Vapor 4.81.0, released ~12 hours ago at the time of this writing.

finestructure commented 1 year ago

I just realised we'd updated to that latest Swift 5.9 and run the package updates but it's not deployed yet. It's the revision just after so what was running had just the ShellOut changes!

finestructure commented 1 year ago

Another prod hang, 15:57 CET, Sep 9 2023. stderr is just

2023-09-09T10:01:16.343649916Z /bin/bash: line 1: 13830 Segmentation fault      (core dumped) ./Run analyze --env prod --limit 25

stdout logs below - I don't see any backtrace or other bits that indicate issues, except the infamous

2023-09-09T14:09:20.520006338Z [ ERROR ] Connection request (ID 25 timed out. This might indicate a connection deadlock in your application. If you have long-running requests, consider increasing your connection timeout. [component: server, database-id: psql]

analyze-2023-09-09-stderr.logs.zip