Reporting of traceback in UI when Covalent errors arise

Andrew-S-Rosen commented 1 year ago

What should we add?

Currently, when an error is raised, only the error message itself is reported rather than any details regarding the traceback or the line numbers. This makes it very difficult to debug, unlike when an error arises locally where a full traceback is present. Arguably, it is even more important to have the traceback when running remotely since that's when things often go haywire...

A clearer traceback report should be provided in the UI.

Here is how you can reproduce it:

import covalent as ct

executor = ct.executor.SlurmExecutor(
    username="myname",
    address="bad-address!",
)

@ct.electron(executor=executor)
def add(a, b):
    return a + b

@ct.lattice
def workflow(a, b):
    return add(a, b)

dispatch_id = ct.dispatch(workflow)(1, 2)
result = ct.get_result(dispatch_id, wait=True)
print(result)

This is, of course, a bit of a toy example and not reflective of the common scenario.

Describe alternatives you've considered.

No response

santoshkumarradha commented 1 year ago

@Andrew-S-Rosen it is very fruitful you opened this issue as we are starting to scope work for better error reporting. What would be an ideal UX for this ? Usually tracebacks are pretty large (currently they can be queried on the result object ofcourse, but that's not quick enough for iterative monitoring of errors ofcourse) and we would need a good way to display this information, would a modal on the error be a better solution beyond a certain line limit ?

also quick question, in the above example, is the trace back you are talking about the traceback due to bad executor parameters I assume and not errors due to code mishaps inside electrons? If so, would you like to visually have two separate errors on your workflow UI?

santoshkumarradha commented 1 year ago

Moving this to Main covalent as this is a bug as well. In the following example, the error generated by covalent (because of wrong executor parameter) does not even have base tracing., unlike actual errors generated by electron which are usually truncated in the UI

import covalent as ct

executor = ct.executor.LocalExecutor(workdir=23)

@ct.electron
def add(a, b):
    return a + b

@ct.lattice(executor=executor,workflow_executor=executor)
def workflow(a, b):
    return add(a, b)

dispatch_id = ct.dispatch(workflow)(1, 2)
result = ct.get_result(dispatch_id,wait=True)

santoshkumarradha commented 1 year ago

@cjao , do we not expect the error file to capture the traceback of our own errors like these?

Andrew-S-Rosen commented 1 year ago

currently they can be queried on the result object of course, but that's not quick enough for iterative monitoring of errors of course

Normally that would be perfectly fine, but I believe in this scenario that only the error message itself is reported in the result object (perhaps that is what you're alluding to about it being a bug).

Regarding potential UX, this is a very good question, and I must admit that I'm not sure! I agree that a full traceback might be a bit overwhelming, especially depending on who is using it. At the very least, perhaps a line number would be useful. That would be relatively unobtrusive while still providing a fairly useful piece of info.

would a modal on the error be a better solution beyond a certain line limit ?

I'm not quite sure what modal refers to here, but certainly some toggle or scroll bar or even just "..." could be useful within a certain line limit.

also quick question, in the above example, is the trace back you are talking about the traceback due to bad executor parameters I assume and not errors due to code mishaps inside electrons? If so, would you like to visually have two separate errors on your workflow UI?

The example here is due to the executor --- I have to remind myself what the errors look like when an error is raised inside an electron (hopefully shows the traceback?). I'm pretty indifferent about if it'd be one or two separate errors though. Either way, it's a failed case so doesn't really matter in my opinion.

Andrew-S-Rosen commented 1 year ago

Another example is the image shown in https://github.com/AgnostiqHQ/covalent/issues/1781.

Andrew-S-Rosen commented 1 year ago

@santoshkumarradha, @cjao: I can confirm that the full traceback appears when the electron itself fails, but no traceback is present when the issue lies on the Covalent side.

santoshkumarradha commented 1 year ago

@Andrew-S-Rosen just quick question, any chance it was shown in the covalent logs or on the logs screen in UI ?

Andrew-S-Rosen commented 1 year ago

@santoshkumarradha: I just checked and don't see anything relevant there, although it's possible I'm missing it.

Here's a simpler example to run (if using 0.227.0rc0 or 0.228.0rc-0, which has a bug in the post-processing step):

import covalent as ct

executor = ct.executor.LocalExecutor()

@ct.electron
def add(a, b):
    return a + b

@ct.lattice(executor=executor)
def workflow(a, b):
    return add(a, b)

ct.dispatch(workflow)(1, 2)

AgnostiqHQ / covalent

Reporting of traceback in UI when Covalent errors arise #1782

What should we add?

Describe alternatives you've considered.