bacalhau-project / bacalhau

Compute over Data framework for public, transparent, and optionally verifiable computation
https://docs.bacalhau.org
Apache License 2.0
706 stars 89 forks source link

fix nats server panic on shutdown #4712

Closed wdbaruni closed 1 day ago

wdbaruni commented 1 day ago

Fix panics when shutting down the orchestrator due to NATS server being shutdown twice, one by our application and another by sign handler that NATS server installs on its own. The fix here is by disabling NATS server sig handlers using NoSigs: true

The PR also added additional NATS debug info to /api/v1/agent/debug API

08:41:56.662 | DBG Expanso/workspace/bacalhau/pkg/nats/transport/nats.go:391 > Shutting down server n-d1c1b605-bf3d-4569-a1f0-b9a0a1a3f4b1
panic: close of nil channel

goroutine 548 [running]:
github.com/nats-io/nats-server/v2/server.(*Server).shutdownEventing(0x14000aded88)
    /Users/walid/.go/pkg/mod/github.com/nats-io/nats-server/v2@v2.10.20/server/events.go:1734 +0x12c
github.com/nats-io/nats-server/v2/server.(*Server).Shutdown(0x14000aded88)
    /Users/walid/.go/pkg/mod/github.com/nats-io/nats-server/v2@v2.10.20/server/server.go:2430 +0x3c
github.com/bacalhau-project/bacalhau/pkg/nats.(*ServerManager).Stop(...)
    /Users/walid/Expanso/workspace/bacalhau/pkg/nats/server.go:72
github.com/bacalhau-project/bacalhau/pkg/nats/transport.(*NATSTransport).Close(0x14000c12870, {0x105a61718?, 0x14000935690?})
    /Users/walid/Expanso/workspace/bacalhau/pkg/nats/transport/nats.go:392 +0xd0
github.com/bacalhau-project/bacalhau/pkg/node.NewNode.func3({0x105a61718, 0x14000935690})
    /Users/walid/Expanso/workspace/bacalhau/pkg/node/node.go:261 +0xc0
github.com/bacalhau-project/bacalhau/pkg/system.(*CleanupManager).Cleanup.func1({0x1056161c0?, 0x140008a4f00?})
    /Users/walid/Expanso/workspace/bacalhau/pkg/system/cleanup.go:71 +0x98
created by github.com/bacalhau-project/bacalhau/pkg/system.(*CleanupManager).Cleanup in goroutine 53
    /Users/walid/Expanso/workspace/bacalhau/pkg/system/cleanup.go:65 +0x178

Summary by CodeRabbit

coderabbitai[bot] commented 1 day ago

Walkthrough

The changes in this pull request involve enhancements to the GetDebugInfo method in the ServerManager struct within pkg/nats/server.go, which now retrieves additional debugging information about subscriptions and JetStream configurations. Modifications to the NATSTransport configuration in pkg/nats/transport/nats.go include setting NoSigs to true, which alters the server's behavior regarding termination signals. Additionally, new entries were added to a custom dictionary file to recognize various relevant terms.

Changes

File Path Change Summary
pkg/nats/server.go Updated GetDebugInfo method to return additional JetStream information, modifying the return structure to include "JetStreamz".
pkg/nats/transport/nats.go Modified NewNATSTransport to set NoSigs to true, disabling server termination on SIGINT/SIGTERM signals.
.cspell/custom-dictionary.txt Added new entries: Streamz, Nacked, Routez, Connz, Subsz to the custom dictionary.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant ServerManager

    Client->>ServerManager: Request Debug Info
    ServerManager->>ServerManager: Call Subsz with options
    ServerManager->>ServerManager: Call Jsz with options
    ServerManager->>Client: Return Debug Info (including JetStreamz)

🐰 "In the server's heart, new data flows,
With JetStream details, the knowledge grows.
No more SIGs to end the fun,
The NATS transport keeps on the run!
Debugging now is a joyful quest,
With insights deep, we’ll do our best!" 🐇


🪧 Tips ### Chat There are 3 ways to chat with [CodeRabbit](https://coderabbit.ai): - Review comments: Directly reply to a review comment made by CodeRabbit. Example: - `I pushed a fix in commit , please review it.` - `Generate unit testing code for this file.` - `Open a follow-up GitHub issue for this discussion.` - Files and specific lines of code (under the "Files changed" tab): Tag `@coderabbitai` in a new review comment at the desired location with your query. Examples: - `@coderabbitai generate unit testing code for this file.` - `@coderabbitai modularize this function.` - PR comments: Tag `@coderabbitai` in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples: - `@coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.` - `@coderabbitai read src/utils.ts and generate unit testing code.` - `@coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.` - `@coderabbitai help me debug CodeRabbit configuration file.` Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. ### CodeRabbit Commands (Invoked using PR comments) - `@coderabbitai pause` to pause the reviews on a PR. - `@coderabbitai resume` to resume the paused reviews. - `@coderabbitai review` to trigger an incremental review. This is useful when automatic reviews are disabled for the repository. - `@coderabbitai full review` to do a full review from scratch and review all the files again. - `@coderabbitai summary` to regenerate the summary of the PR. - `@coderabbitai resolve` resolve all the CodeRabbit review comments. - `@coderabbitai configuration` to show the current CodeRabbit configuration for the repository. - `@coderabbitai help` to get help. ### Other keywords and placeholders - Add `@coderabbitai ignore` anywhere in the PR description to prevent this PR from being reviewed. - Add `@coderabbitai summary` to generate the high-level summary at a specific location in the PR description. - Add `@coderabbitai` anywhere in the PR title to generate the title automatically. ### CodeRabbit Configuration File (`.coderabbit.yaml`) - You can programmatically configure CodeRabbit by adding a `.coderabbit.yaml` file to the root of your repository. - Please see the [configuration documentation](https://docs.coderabbit.ai/guides/configure-coderabbit) for more information. - If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: `# yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json` ### Documentation and Community - Visit our [Documentation](https://docs.coderabbit.ai) for detailed information on how to use CodeRabbit. - Join our [Discord Community](http://discord.gg/coderabbit) to get help, request features, and share feedback. - Follow us on [X/Twitter](https://twitter.com/coderabbitai) for updates and announcements.