Closed simonwo closed 5 months ago
[!IMPORTANT]
Auto Review Skipped
Auto reviews are disabled on this repository.
Please check the settings in the CodeRabbit UI or the
.coderabbit.yaml
file in this repository.To trigger a single review, invoke the
@coderabbitai review
command.
The modifications introduce a comprehensive overhaul to error handling and event logging across various components, with a focus on structured errors, enhanced job and execution history tracking, and refined event management. The changes aim to improve debugging, monitoring, and the user interface experience by providing more detailed and structured information on job executions and errors.
File Pattern | Change Summary |
---|---|
cmd/cli/job/* | Enhanced job and execution history display, added structured error handling. |
pkg/compute/* | Updated logging, error handling, and event management with structured errors and detailed events. |
pkg/jobstore/, pkg/models/ | Incorporated events in job and execution creation, and updated structures to include event details. |
pkg/orchestrator/* | Improved error handling, event creation, and job state updates with structured errors and events. |
pkg/requester/endpoint.go, pkg/test/* | Adjusted to new event handling approach and structured errors. |
Objective | Addressed | Explanation |
---|---|---|
Return structured errors from the API (#693) | β | |
Hide extraneous error output but show it with job describe (#694) |
β | |
Implement a status update mechanism for compute nodes... (#407) | β | The changes do not address status updates or the synchronization of state between requesters and compute nodes. |
In the land of code and binary streams,
A rabbit hopped, chasing its dreams.
π° "To errors and logs, let's bring some light,
Structure and clarity, make them bright!"
Through files and functions, it danced with glee,
Crafting a world where errors are easy to see.
"With every hop, let's make it clear,
Debugging's no longer something to fear!" π
@coderabbitai review
This PR implements the structure proposed in Improve Error Reporting as a first step towards providing richer progress reporting during job execution.
The "tl;dr;" is that we will move to using an event stream for reporting progress on jobs. The event stream will help users understand the progress of their job and give them extra context about any failures that occur. This will allow us to show a richer view in the UI, e.g. the user will be able to see "downloading Docker image" instead of just "job running".
To achieve this vision, we need to build this infrastructure for generating events, recording them in the job history, and displaying them (done), replace the orchestrator/compute callbacks mechanism (later PR), and then give lower level components the ability to push events (later PR).
This PR also includes some facility for structured error reporting. This allows low-level components to throw structured errors that provide a richer event than the ones generated automatically. This is used in e.g. the ErrNotEnoughNodes case and docker ImageUnavailable case so far.
This gives us the ability to output hints as part of our messages back to the user:
The output of
describe
now shows a split history between the overall job and its executions:Resolves https://github.com/bacalhau-project/expanso-planning/issues/693. Resolves https://github.com/bacalhau-project/expanso-planning/issues/694.
TODO in this PR