AgnostiqHQ / covalent

Pythonic tool for orchestrating machine-learning/high performance/quantum-computing workflows in heterogeneous compute environments.
https://www.covalent.xyz
Apache License 2.0
765 stars 91 forks source link

Potentially confusing "status" column in Covalent UI #1756

Open Andrew-S-Rosen opened 1 year ago

Andrew-S-Rosen commented 1 year ago

Environment

What is happening?

When dispatching a Lattice with two Electrons in it, the status marker in the UI doesn't necessarily show 2/2 jobs. For instance, in the simple toy example below, it shows 6/6 because there are (strictly speaking) 6 nodes but most of these are hidden from the user and are more internal.

How can we reproduce the issue?

import covalent as ct

@ct.electron  
def add(a, b):
    return a + b

@ct.electron
def mult(a, b):
    return a * b

@ct.lattice  
def workflow(a, b, c):
    return mult(add(a, b), c)

dispatch_id = ct.dispatch(workflow)(1, 2, 3)  
result = ct.get_result(dispatch_id, wait=True)  

What should happen?

This isn't really a bug per se, but it might cause confusion for new users who expect 2 in the above example but see 6 (the remaining 4 being from 3 inputs and 1 post-processing step). This is especially the case because, by default, these tasks are hidden by default in the UI and are only visible by selecting "toggle parameters" and "toggle postprocess." If they are hidden by default in the workflow UI, maybe it would be helpful to also have them hidden by default in the overall UI for consistency.

Of course, there are reasons to perhaps not do this. If the post-processing task fails (for example), all Electron tasks could succeed even though the workflow fails. It'd be important to show the failure in the UI.

Any suggestions?

One clearer approach might be to have a column showing the number of completed Electrons and then a separate column showing the overall workflow status (green check mark or red x).

santoshkumarradha commented 1 year ago

One clearer approach might be to have a column showing the number of completed Electrons and then a separate column showing the overall workflow status (green check mark or red x).

Hey @arosen93, thanks for this; this is something we have been struggling with how best to represent as well. Could you elaborate a bit more on how ideally you might want it to be? For instance, say we have 10 electrons with five hidden ones (parameters, post-processing) with all of them completed along with two other non-parameters/post-processing electrons; how would the ideal message look like for you?

Andrew-S-Rosen commented 1 year ago

@santoshkumarradha: I must admit that I also am not sure what the ideal message would be. 😄 It's a tough one.

From the user's perspective, I think most people will only be thinking about the number of Electron objects in their Lattice when they go to check out the UI. So, the first number most people would expect to see would be how many completed out of 10. This is important because it's often easy to know which workflow is which at a quick glance based on the number of Electron objects there are, but this is currently not displayed in the main UI. If all of the user's workflows are failing on Electron 4, then the user will immediately know what job is causing issues. But if all of the user's workflows are failing on step 14 because hidden nodes are included, it's a bit more cryptic at a high level.

The caveat is that there are can be actual errors in the hidden ones, which the user will need to see in order to debug the workflow and accurately ascertain the status.

While I don't know how feasible this would be, in my head I see the following as being potentially more user friendly:

  1. A column showing N/10. The color would be green if 10/10 is hit or red if < 10/10 are completed.
  2. A separate column showing the overall workflow status. This could be a simple green check or red X. It'd be green if 10/10 of the user-defined electrons are completed and all hidden ones succeed. It'd be red if anything raises an error.
  3. Alternatively, the separate column could include the overall count. If the parameters can never lead to a failure, then maybe I'd say N/12 (10 + the two non-parameters). If the parameters can lead to a failure, then I'd say N/17 (10 + 5 parameters + 2 non-parameters). Personally, I think I'd lean towards the Step 1 and 2 above. I fear this would be confusing.

Whatever is chosen, consistent messaging would have to be present in the details page when you click on a given UUID.

I'm certainly not an expert in user interfaces, so take this all with a grain of salt!