SwiftPackageIndex / SwiftPackageIndex-Server

The Swift Package Index is the place to find Swift packages!
https://swiftpackageindex.com
Apache License 2.0
559 stars 47 forks source link

"Received signal 11" in app_analyze #2227

Closed finestructure closed 1 year ago

finestructure commented 1 year ago

I finally caught a glimpse as to why analysis sometimes hangs:

2023-01-21T18:57:54.937913401Z app_analyze.1.t1dix7fv4zph@p3    | [ INFO ] Analyzing (limit: 25) ... [component: analyze]
2023-01-21T18:57:54.983740764Z app_analyze.1.t1dix7fv4zph@p3    | [ INFO ] Checkout directory: /checkouts [component: analyze]
2023-01-21T18:57:54.987633060Z app_analyze.1.t1dix7fv4zph@p3    | [ INFO ] pulling https://github.com/yonaskolb/Stringly.git in /checkouts/github.com-yonaskolb-stringly [component: analyze]
2023-01-21T18:57:55.021965033Z app_analyze.1.t1dix7fv4zph@p3    | [ INFO ] pulling https://github.com/dagronf/DSFActionBar.git in /checkouts/github.com-dagronf-dsfactionbar [component: analyze]
2023-01-21T18:57:55.029915226Z app_analyze.1.t1dix7fv4zph@p3    | [ INFO ] pulling https://github.com/ordo-one/package-frostflake.git in /checkouts/github.com-ordo-one-package-frostflake [component: analyze]
2023-01-21T18:57:55.271462231Z app_analyze.1.t1dix7fv4zph@p3    | [ INFO ] pulling https://github.com/chojnac/Dumpling.git in /checkouts/github.com-chojnac-dumpling [component: analyze]
2023-01-21T18:57:55.273393230Z app_analyze.1.t1dix7fv4zph@p3    | [ INFO ] pulling https://github.com/raymccrae/swift-jsonpatch.git in /checkouts/github.com-raymccrae-swift-jsonpatch [component: analyze]
2023-01-21T18:57:55.277729726Z app_analyze.1.t1dix7fv4zph@p3    | Received signal 11. Backtrace:
2023-01-21T18:58:04.984287787Z app_analyze.1.t1dix7fv4zph@p3    | [ ERROR ] Connection request timed out. This might indicate a connection deadlock in your application. If you're running long running requests, consider increasing your connection timeout. [component: server, database-id: psql]
2023-01-21T18:58:04.984746687Z app_analyze.1.t1dix7fv4zph@p3    | [ ERROR ] Connection request timed out. This might indicate a connection deadlock in your application. If you're running long running requests, consider increasing your connection timeout. [component: server, database-id: psql]

The backtrace itself isn't in the logs but hopefully the crash is reproducible with one of the packages in question.

finestructure commented 1 year ago

Hold on, that segfault is from 10:01 this morning, so certainly not the issue.

Again, it seems like the segfault/signal 11 is a red herring and the hangs are due to something else but who knows what's going on.

finestructure commented 1 year ago

This is on prod with Vapor 4.81.0, compiled with Swift 5.9 from Sep 1.

 ################################################################
 #                                                              #
 # Swift Nightly Docker Image                                   #
 # Tag: swift-5.9-DEVELOPMENT-SNAPSHOT-2023-09-01-a                     #
 #                                                              #
 ################################################################
gwynne commented 1 year ago

@finestructure If the corefile (indicated by (core dumped)) is still present in the container that runs the command, it can be used to get a backtrace of the crash.

finestructure commented 1 year ago

prod, 16:33 CET, Sep 14 2023

I missed the first alert and only checked on this now (21:37 CET (UTC+2)). I've pulled the logs and the closest Seg fault is at 13:28 UTC

2023-09-14T13:28:13.410953838Z /bin/bash: line 1: 29712 Segmentation fault      (core dumped) ./Run analyze --env prod --limit 25

with processing continuing until 14:18 UTC while the connection timeout messages leading to the hang are appearing:

2023-09-14T14:18:09.519182656Z [ ERROR ] Connection request (ID 6 timed out. This might indicate a connection deadlock in your application. If you have long-running requests, consider increasing your connection timeout. [component: server, database-id: psql]
2023-09-14T14:18:09.529500376Z [ ERROR ] Connection request (ID 7 timed out. This might indicate a connection deadlock in your application. If you have long-running requests, consider increasing your connection timeout. [component: server, database-id: psql]

The are 15 seg faults in the log file I pulled (ranging from Sep 11 to Sep 14). I think it's safe to say that the seg faults aren't the cause of the hangs.

I looked for a core file in the running container but there was none in the executable's directory nor in a few other places I checked (/var/cache/abrt, /var/spool/abrt, /var/crash). Core dump size it unlimited:

root@fce97f658c3d:/app# ulimit -a
real-time non-blocking time  (microseconds, -R) unlimited
core file size              (blocks, -c) unlimited

It's late and so I restarted the container for now. Can look into where it ends up some other time - these seem to happen frequently enough (15 times in 3 days).

BTW, the logs do not contain any stack trace info, despite the latest Vapor and Swift 5.9 🤔:

2023-09-14T13:28:08.682435359Z [ INFO ] pulling https://github.com/pointfreeco/swift-dependencies.git in /checkouts/github.com-pointfreeco-swift-dependencies [component: analyze]
2023-09-14T13:28:08.684063885Z [ INFO ] pulling https://github.com/Alamofire/AlamofireImage.git in /checkouts/github.com-alamofire-alamofireimage [component: analyze]
2023-09-14T13:28:09.417655788Z [ WARNING ] stderr: From https://github.com/team-telnyx/telnyx-webrtc-ios
2023-09-14T13:28:09.417694189Z  * [new tag]         0.1.10     -> 0.1.10 [component: analyze]
2023-09-14T13:28:10.842271007Z [ INFO ] throttled 1 incoming revisions [component: analyze]
2023-09-14T13:28:10.937562626Z [ INFO ] throttled 1 incoming revisions [component: analyze]
2023-09-14T13:28:11.961565948Z [ INFO ] Updating 25 packages for stage 'analysis' (errors: 0) [component: analyze]
2023-09-14T13:28:13.410953838Z /bin/bash: line 1: 29712 Segmentation fault      (core dumped) ./Run analyze --env prod --limit 25
2023-09-14T13:28:33.536090365Z [ INFO ] Analyzing (limit: 25) ... [component: analyze]
2023-09-14T13:28:33.592712560Z [ INFO ] Checkout directory: /checkouts [component: analyze]
2023-09-14T13:28:33.593526773Z [ INFO ] Updating 0 packages for stage 'analysis' (errors: 0) [component: analyze]
2023-09-14T13:28:55.544915191Z [ INFO ] Analyzing (limit: 25) ... [component: analyze]
2023-09-14T13:28:55.619826266Z [ INFO ] Checkout directory: /checkouts [component: analyze]
gwynne commented 1 year ago

@finestructure For next time, I'd suggest just searching the entire filesystem, e.g. find / -name core (if that finds nothing, I'd give it one more shot with a more permissive search, like find / -iname '*core*').

finestructure commented 1 year ago

Closing this as fixed now - we haven't had a hang since we removed the TaskGroup in analysis in #2656 (released as 2.91.9).

Finally 🙂🎉

Huge thanks again to Gwynne for all the help!