dora-rs / dora

DORA (Dataflow-Oriented Robotic Application) is middleware designed to streamline and simplify the creation of AI-based robotic applications. It offers low latency, composable, and distributed dataflow capabilities. Applications are modeled as directed graphs, also referred to as pipelines.
https://dora-rs.ai
Apache License 2.0
1.35k stars 67 forks source link

Auto fetch logs and clean up error trace for better readibility #542

Closed haixuanTao closed 2 weeks ago

haixuanTao commented 3 weeks ago

This PR auto fetch logs instead of asking for users to search for logs.

It also reduces error summary to let users directly get the most useful information.

Before

018ffe79-1c0c-7b68-8886-d8cff079c232
  2024-06-09T19:28:21.162169Z ERROR dora_daemon::spawn: 018ffe79-1c0c-7b68-8886-d8cff079c232/plot: 
Traceback (most recent call last):
  File "/home/peter/Documents/work/dora/examples/python-dataflow/plot.py", line 83, in <module>
    assert False, "test PR"
AssertionError: test PR

    at binaries/daemon/src/spawn.rs:371

  2024-06-09T19:28:21.170014Z ERROR dora_daemon: 
    018ffe79-1c0c-7b68-8886-d8cff079c232/plot failed with exit code 1.

    Check logs using: dora logs 018ffe79-1c0c-7b68-8886-d8cff079c232 plot

    at binaries/daemon/src/lib.rs:1077

  2024-06-09T19:28:21.170037Z  WARN dora_daemon::pending: node `plot` exited before initializing dora connection
    at binaries/daemon/src/pending.rs:80

  2024-06-09T19:28:22.877741Z ERROR dora_daemon::spawn: 018ffe79-1c0c-7b68-8886-d8cff079c232/webcam: 
Traceback (most recent call last):
  File "/home/peter/Documents/work/dora/examples/python-dataflow/webcam.py", line 11, in <module>
    node = Node()
RuntimeError: failed to init event stream

Caused by:
    subscribe failed: Some nodes exited before subscribing to dora: {NodeId("plot")}

    This is typically happens when an initialization error occurs
                    in the node or operator. To check the output of the failed
                    nodes, run `dora logs 018ffe79-1c0c-7b68-8886-d8cff079c232 plot`.

Location:
    apis/rust/node/src/event_stream/mod.rs:90:17

    at binaries/daemon/src/spawn.rs:371

  2024-06-09T19:28:22.893596Z ERROR dora_daemon: 
    018ffe79-1c0c-7b68-8886-d8cff079c232/webcam failed with exit code 1.

    Check logs using: dora logs 018ffe79-1c0c-7b68-8886-d8cff079c232 webcam

    at binaries/daemon/src/lib.rs:1077

  2024-06-09T19:28:23.247953Z ERROR dora_daemon::spawn: 018ffe79-1c0c-7b68-8886-d8cff079c232/object_detection: 
Traceback (most recent call last):
  File "/home/peter/Documents/work/dora/examples/python-dataflow/object_detection.py", line 13, in <module>
    node = Node()
RuntimeError: failed to init event stream

Caused by:
    subscribe failed: Some nodes exited before subscribing to dora: {NodeId("plot")}

    This is typically happens when an initialization error occurs
                    in the node or operator. To check the output of the failed
                    nodes, run `dora logs 018ffe79-1c0c-7b68-8886-d8cff079c232 plot`.

Location:
    apis/rust/node/src/event_stream/mod.rs:90:17

    at binaries/daemon/src/spawn.rs:371

  2024-06-09T19:28:23.263843Z ERROR dora_daemon: 
    018ffe79-1c0c-7b68-8886-d8cff079c232/object_detection failed with exit code 1.

    Check logs using: dora logs 018ffe79-1c0c-7b68-8886-d8cff079c232 object_detection

    at binaries/daemon/src/lib.rs:1077

  2024-06-09T19:28:23.264003Z ERROR dora_coordinator: some nodes failed:
  - object_detection: 
    018ffe79-1c0c-7b68-8886-d8cff079c232/object_detection failed with exit code 1.

    Check logs using: dora logs 018ffe79-1c0c-7b68-8886-d8cff079c232 object_detection

  - plot: 
    018ffe79-1c0c-7b68-8886-d8cff079c232/plot failed with exit code 1.

    Check logs using: dora logs 018ffe79-1c0c-7b68-8886-d8cff079c232 plot

  - webcam: 
    018ffe79-1c0c-7b68-8886-d8cff079c232/webcam failed with exit code 1.

    Check logs using: dora logs 018ffe79-1c0c-7b68-8886-d8cff079c232 webcam

Location:
    /home/peter/Documents/work/dora/binaries/coordinator/src/listener.rs:90:56
    at binaries/coordinator/src/lib.rs:279

dataflow failed: errors occurred in dataflow 018ffe79-1c0c-7b68-8886-d8cff079c232:
- machine ``:
    some nodes failed:
      - object_detection: 
        018ffe79-1c0c-7b68-8886-d8cff079c232/object_detection failed with exit code 1.

        Check logs using: dora logs 018ffe79-1c0c-7b68-8886-d8cff079c232 object_detection

      - plot: 
        018ffe79-1c0c-7b68-8886-d8cff079c232/plot failed with exit code 1.

        Check logs using: dora logs 018ffe79-1c0c-7b68-8886-d8cff079c232 plot

      - webcam: 
        018ffe79-1c0c-7b68-8886-d8cff079c232/webcam failed with exit code 1.

        Check logs using: dora logs 018ffe79-1c0c-7b68-8886-d8cff079c232 webcam

    Location:
        /home/peter/Documents/work/dora/binaries/coordinator/src/listener.rs:90:56

After

018ffe7b-39d7-725a-a812-ac9c220a11c8
  2024-06-09T19:30:41.604627Z ERROR dora_daemon: 018ffe7b-39d7-725a-a812-ac9c220a11c8/plot exited with code 1 with last logs:

Traceback (most recent call last):
  File "/home/peter/Documents/work/dora/examples/python-dataflow/plot.py", line 88, in <module>
    assert False, "test PR"
AssertionError: test PR

    at binaries/daemon/src/lib.rs:1109

  2024-06-09T19:30:52.237132Z ERROR dora_coordinator: - 018ffe7b-39d7-725a-a812-ac9c220a11c8/plot exited with code 1 with last logs:

Traceback (most recent call last):
  File "/home/peter/Documents/work/dora/examples/python-dataflow/plot.py", line 88, in <module>
    assert False, "test PR"
AssertionError: test PR

    at binaries/coordinator/src/lib.rs:279

dataflow failed: 018ffe7b-39d7-725a-a812-ac9c220a11c8 failed on:
- machine ``:
    - 018ffe7b-39d7-725a-a812-ac9c220a11c8/plot exited with code 1 with last logs:

    Traceback (most recent call last):
      File "/home/peter/Documents/work/dora/examples/python-dataflow/plot.py", line 88, in <module>
        assert False, "test PR"
    AssertionError: test PR

On a terminal without stdout from dora up

Before

~/D/w/d/e/python-dataflow ❯❯❯ dora start dataflow.yml --attach        
018ffe7e-f6fa-7431-9dad-3c127137ac43
^Cdataflow failed: errors occurred in dataflow 018ffe7e-f6fa-7431-9dad-3c127137ac43:
- machine ``:
    some nodes failed:
      - plot: 
        018ffe7e-f6fa-7431-9dad-3c127137ac43/plot failed with exit code 1.

        Check logs using: dora logs 018ffe7e-f6fa-7431-9dad-3c127137ac43 plot

      - webcam: 
        018ffe7e-f6fa-7431-9dad-3c127137ac43/webcam failed with signal `SIGKILL`

        Check logs using: dora logs 018ffe7e-f6fa-7431-9dad-3c127137ac43 webcam

    Location:
        /home/peter/Documents/work/dora/binaries/coordinator/src/listener.rs:90:56

After

~/D/w/d/e/python-dataflow ❯❯❯ dora start dataflow.yml --attach          
018ffe7d-6648-7177-8986-f5734988eee2
^Cdataflow failed: 018ffe7d-6648-7177-8986-f5734988eee2 failed on:
- machine ``:
    - 018ffe7d-6648-7177-8986-f5734988eee2/plot exited with code 1 with last logs:

    Traceback (most recent call last):
      File "/home/peter/Documents/work/dora/examples/python-dataflow/plot.py", line 88, in <module>
        assert False, "test PR"
    AssertionError: test PR

    - 018ffe7d-6648-7177-8986-f5734988eee2/webcam exited with `SIGKILL` with last logs:
haixuanTao commented 2 weeks ago

Close in favor of #552