bacalhau-project / bacalhau

Compute over Data framework for public, transparent, and optionally verifiable computation
https://docs.bacalhau.org
Apache License 2.0
711 stars 89 forks source link

async execution handling #4683

Closed wdbaruni closed 3 weeks ago

wdbaruni commented 3 weeks ago

This PR introduces async handling of executions in the compute node by decoupling the different components and fully relying on the executor store and watcher library to process executions.

In this implementation we are still using the compute Endpoint and Callbacks to interface with the orchestrator node to maintain backward compatibility. In a follow-up PR will introduce messaging through ncl library for full async communication.

The current flow looks like follows:

sequenceDiagram
    actor OrchestratorNode as Orchestrator Node
    box Compute Node
        participant Endpoint
        participant CallbackForwarder as Callback Forwarder
        participant ExecutionStore as Execution Store
        participant ExecutionWatcher as Execution Watcher
        participant Bidder
        participant Executor
    end

    OrchestratorNode ->> Endpoint: Request Bid
    Endpoint ->> ExecutionStore: Create Execution Entry
    ExecutionStore ->> ExecutionWatcher: Notify: Execution Created
    ExecutionWatcher ->> Bidder: Initiate Bidding
    Bidder ->> ExecutionStore: Update Bid Result (Accepted/Rejected)
    ExecutionStore ->> CallbackForwarder: Notify: Bid Result
    CallbackForwarder ->> OrchestratorNode: Send Bid Result Notification

    alt Bid Accepted
        OrchestratorNode ->> Endpoint: Accept Bid
        Endpoint ->> ExecutionStore: Update Execution State
        ExecutionStore ->> ExecutionWatcher: Notify: Execution Accepted
        ExecutionWatcher ->> Executor: Start Execution
        Executor ->> ExecutionStore: Update Status (Completed/Failed)
        ExecutionStore ->> CallbackForwarder: Notify: Completion Status
        CallbackForwarder ->> OrchestratorNode: Send Execution Status Update
    else Bid Rejected
        OrchestratorNode ->> Endpoint: Reject Bid
        Endpoint ->> ExecutionStore: Update Bid Rejected State
    else Cancel
        OrchestratorNode ->> Endpoint: Cancel Execution
        Endpoint ->> ExecutionStore: Update Canceled State
        ExecutionStore ->> ExecutionWatcher: Notify: Execution Canceled
        ExecutionWatcher ->> Executor: Cancel Execution
    end

Summary by CodeRabbit

coderabbitai[bot] commented 3 weeks ago

[!WARNING]

Rate limit exceeded

@wdbaruni has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 17 minutes and 33 seconds before requesting another review.

⌛ How to resolve this issue? After the wait time has elapsed, a review can be triggered using the `@coderabbitai review` command as a PR comment. Alternatively, push new commits to this PR. We recommend that you space out your commits to avoid hitting the rate limit.
🚦 How do rate limits work? CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our [FAQ](https://coderabbit.ai/docs/faq) for further information.
📥 Commits Reviewing files that changed from the base of the PR and between c0b4f5d9fb0d1d1cfcc54dbfe23efd1e2e9e5d26 and 28b22b4a0a8aa8d38c952092d05c1897a12ff1ff.

Walkthrough

The changes in this pull request involve significant modifications across various files related to bidding strategies, compute nodes, and event handling. Key updates include the simplification of response structures by removing unnecessary fields, adjustments to method signatures, and the introduction of new components for handling execution events. The removal of certain strategies and constants reflects a shift in how events and bids are processed. Additionally, enhancements in error handling and logging levels have been made, particularly in the context of NATS messaging and the compute node's functionality.

Changes

File Path Change Summary
pkg/bidstrategy/chained.go Reinstated import statement; simplified logic in ShouldBid and ShouldBidBasedOnUsage methods.
pkg/bidstrategy/fixed.go Updated NewFixedBidStrategy to accept only one parameter; removed ShouldWait from response.
pkg/bidstrategy/resource/capacity_max_strategy.go Simplified response structure in ShouldBidBasedOnUsage; removed ShouldWait.
pkg/bidstrategy/semantic/export_test.go Removed NodeID field from BidStrategyRequest struct initialization.
pkg/bidstrategy/semantic/external_exec.go Changed marshaling logic in ShouldBid method to marshal request instead of job selection data.
pkg/bidstrategy/semantic/external_http.go Updated marshaling in ShouldBid; enhanced error logging for HTTP request creation.
pkg/bidstrategy/semantic/external_http_test.go Updated request payload type in tests from JobSelectionPolicyProbeData to BidStrategyRequest.
pkg/bidstrategy/type.go Removed NodeID from BidStrategyRequest; updated JSON tags; removed JobSelectionPolicyProbeData.
pkg/bidstrategy/waiting.go Deleted file containing waitingStrategy implementation.
pkg/bidstrategy/waiting_test.go Deleted file containing tests for waitingStrategy.
pkg/compute/bidder.go Removed NodeID; refactored RunBidding method; integrated ReturnBidResult functionality.
pkg/compute/bidder_test.go Removed mock fields; updated test expectations.
pkg/compute/callback_chain.go Deleted file containing ChainedCallback implementation.
pkg/compute/callback_mock.go Removed OnCancelComplete method from CallbackMock.
pkg/compute/endpoint.go Updated field types in BaseEndpointParams; simplified AskForBid method.
pkg/compute/events.go Removed RespondedToBidEvent function.
pkg/compute/executor.go Updated GetLogStream method to use messages.ExecutionLogsRequest.
pkg/compute/executor_buffer.go Removed Callback from ExecutorBufferParams; simplified error handling.
pkg/compute/logstream/server.go Renamed Server to server; updated method signatures.
pkg/compute/logstream/types.go Added new Server interface with GetLogStream method.
pkg/compute/management_client.go Changed logging level in updateResources method from Debug to Trace.
pkg/compute/store/boltdb/store.go Simplified error handling in UpdateExecutionState method.
pkg/compute/types.go Removed OnCancelComplete method from Callback interface.
pkg/compute/watchers/callback_forwarder.go Introduced CallbackForwarder type for handling execution state changes.
pkg/compute/watchers/callback_forwarder_test.go Added tests for CallbackForwarder.
pkg/compute/watchers/event_logger.go Introduced ExecutionLogger for logging execution events.
pkg/compute/watchers/executor_watcher.go Introduced ExecutionUpsertHandler for handling execution upserts.
pkg/executor/docker/executor.go Updated GetLogStream method to use messages.ExecutionLogsRequest.
pkg/executor/docker/executor_test.go Updated test methods to reflect changes in log stream request type.
pkg/executor/docker/handler.go Updated outputStream method to use messages.ExecutionLogsRequest; refined error handling.
pkg/executor/noop/executor.go Updated GetLogStream method to use messages.ExecutionLogsRequest.
pkg/executor/types.go Removed LogStreamRequest struct; updated GetLogStream method signature.
pkg/executor/wasm/executor.go Updated GetLogStream method to use messages.ExecutionLogsRequest.
pkg/executor/wasm/handler.go Updated outputStream method to use messages.ExecutionLogsRequest.
pkg/lib/watcher/boltdb/boltdb.go Introduced subscriber struct for managing notifications in EventStore.
pkg/lib/watcher/boltdb/boltdb_test.go Added tests for concurrent subscribers and long polling functionality.
pkg/lib/watcher/options.go Removed bufferSize field from watchOptions struct.
pkg/lib/watcher/registry.go Added DefaultShutdownTimeout constant; improved Stop method for shutting down watchers.
pkg/lib/watcher/types.go Added comment to HandleEvent method regarding context cancellation.
pkg/lib/watcher/watcher.go Removed ch channel; introduced determineStartingIterator method.
pkg/lib/watcher/watcher_test.go Added TestDetermineStartingIterator method to test iterator logic.
pkg/nats/proxy/callback_handler.go Removed case for OnCancelComplete in handle method.
pkg/nats/proxy/callback_proxy.go Removed OnCancelComplete method from CallbackProxy.
pkg/nats/proxy/constants.go Removed OnCancelComplete constant.
pkg/node/compute.go Added LogstreamServer and Watchers fields; updated NewBidder function.
pkg/node/constants.go Added constants for watcher IDs in the node package.
pkg/node/manager/node_manager.go Changed logging level in UpdateResources method from Debug to Trace.
pkg/node/requester.go Removed orchestratorEvaluationWatcherID constant.
pkg/test/compute/ask_for_bid_pre_approved_test.go Enhanced bid response handling in tests.
pkg/test/compute/ask_for_bid_test.go Improved error handling and expected state assertions in tests.
pkg/test/compute/cancel_test.go Added setup and teardown logic in TestStates method.
pkg/test/compute/setup_test.go Integrated NATS server and client in test suite; increased channel buffer sizes.
pkg/test/requester/retries_test.go Simplified initialization of bid strategies in tests.

Poem

In the meadow where bids take flight,
Strategies dance in the soft moonlight.
With changes made, the flow is clear,
Bunnies rejoice, for the time is near!
No more waiting, just bids that shine,
Hopping along, all is now fine! 🐰✨


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share - [X](https://twitter.com/intent/tweet?text=I%20just%20used%20%40coderabbitai%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20the%20proprietary%20code.%20Check%20it%20out%3A&url=https%3A//coderabbit.ai) - [Mastodon](https://mastodon.social/share?text=I%20just%20used%20%40coderabbitai%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20the%20proprietary%20code.%20Check%20it%20out%3A%20https%3A%2F%2Fcoderabbit.ai) - [Reddit](https://www.reddit.com/submit?title=Great%20tool%20for%20code%20review%20-%20CodeRabbit&text=I%20just%20used%20CodeRabbit%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20proprietary%20code.%20Check%20it%20out%3A%20https%3A//coderabbit.ai) - [LinkedIn](https://www.linkedin.com/sharing/share-offsite/?url=https%3A%2F%2Fcoderabbit.ai&mini=true&title=Great%20tool%20for%20code%20review%20-%20CodeRabbit&summary=I%20just%20used%20CodeRabbit%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20proprietary%20code)
🪧 Tips ### Chat There are 3 ways to chat with [CodeRabbit](https://coderabbit.ai): - Review comments: Directly reply to a review comment made by CodeRabbit. Example: - `I pushed a fix in commit , please review it.` - `Generate unit testing code for this file.` - `Open a follow-up GitHub issue for this discussion.` - Files and specific lines of code (under the "Files changed" tab): Tag `@coderabbitai` in a new review comment at the desired location with your query. Examples: - `@coderabbitai generate unit testing code for this file.` - `@coderabbitai modularize this function.` - PR comments: Tag `@coderabbitai` in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples: - `@coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.` - `@coderabbitai read src/utils.ts and generate unit testing code.` - `@coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.` - `@coderabbitai help me debug CodeRabbit configuration file.` Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. ### CodeRabbit Commands (Invoked using PR comments) - `@coderabbitai pause` to pause the reviews on a PR. - `@coderabbitai resume` to resume the paused reviews. - `@coderabbitai review` to trigger an incremental review. This is useful when automatic reviews are disabled for the repository. - `@coderabbitai full review` to do a full review from scratch and review all the files again. - `@coderabbitai summary` to regenerate the summary of the PR. - `@coderabbitai resolve` resolve all the CodeRabbit review comments. - `@coderabbitai configuration` to show the current CodeRabbit configuration for the repository. - `@coderabbitai help` to get help. ### Other keywords and placeholders - Add `@coderabbitai ignore` anywhere in the PR description to prevent this PR from being reviewed. - Add `@coderabbitai summary` to generate the high-level summary at a specific location in the PR description. - Add `@coderabbitai` anywhere in the PR title to generate the title automatically. ### CodeRabbit Configuration File (`.coderabbit.yaml`) - You can programmatically configure CodeRabbit by adding a `.coderabbit.yaml` file to the root of your repository. - Please see the [configuration documentation](https://docs.coderabbit.ai/guides/configure-coderabbit) for more information. - If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: `# yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json` ### Documentation and Community - Visit our [Documentation](https://coderabbit.ai/docs) for detailed information on how to use CodeRabbit. - Join our [Discord Community](http://discord.gg/coderabbit) to get help, request features, and share feedback. - Follow us on [X/Twitter](https://twitter.com/coderabbitai) for updates and announcements.