OpenFn / kit

The bits & pieces that make OpenFn work. (diagrammer, cli, compiler, runtime, runtime manager, logger, etc.)
10 stars 9 forks source link

Better crash logs #566

Closed mtuchi closed 8 months ago

mtuchi commented 8 months ago

User story

Currently if a run crash on lightning the error logs are not useful for debugging what's going on. This makes it hard to troubleshoot

Details

Example logs with possible Connection Error problem

Starting job 198aad79-70c3-47e2-8042-76b899205e12
Intialising pipeline
Timeout set to 300000ms
Versions for run 757fbaf4-e425-4b62-8ecc-734cb901a183:
    ▸ node.js                   18.19.0
    ▸ worker                    0.5.0
    ▸ engine                    0.2.6
    ▸ @openfn/language-mssql    3.0.0
[linker] loading module @openfn/language-mssql
[linker] Loading module @openfn/language-mssql from /tmp/openfn/worker/repo/node_modules/@openfn/language-mssql_3.0.0/lib/index.js
Resolved adaptor @openfn/language-mssql to version 3.0.0
Executing expression (1 operations)
Starting job c993f473-89a6-4983-a56a-859d2a100424
Intialising pipeline
Timeout set to 300000ms
[linker] loading module @openfn/language-postgresql
[linker] Loading module @openfn/language-postgresql from /tmp/openfn/worker/repo/node_modules/@openfn/language-postgresql_4.1.8/dist/index.cjs
Resolved adaptor @openfn/language-postgresql to version 4.1.8
Versions for run 1374897a-c729-417a-9715-43b593173343:
    ▸ node.js                        18.19.0
    ▸ worker                         0.5.0
    ▸ engine                         0.2.6
    ▸ @openfn/language-postgresql    4.1.8
Executing expression (2 operations)
Starting operation 1
Operation 1 complete in 0ms
Starting operation 2
Screenshot 2024-01-25 at 1 30 52 PM

The Ask

When the run crashes it will be nice to have logs about the connection or the final output should contain the entire error log. This will help a lot with debugging

josephjclark commented 8 months ago

Both of these adaptors will call process.exit() if they encounter a connection problem.

That is almost certainly going to stop the logs escaping the worker process and reaching Lightning.

I'm not sure what I can do about this. I'll look into it.

josephjclark commented 8 months ago

Right, I've found two issues here:

1) Logs coming out of the adaptor do not get sent back to Lighning. In the CLI you'll get an adaptor error of some kind, but in Lightning you'll get an awkward silence.

2) The error message returned by the worker is eaten by Lightning so you can't see it anywhere, except maybe the Lightning UI when they work out how to implement it. But what I should be able to do is log the exit reason and error as the last line of the job.

josephjclark commented 8 months ago

567 should help the second part of the problem.

The first part is still open and tracked by #570 - but I intend on doing that next