AgnostiqHQ / covalent-slurm-plugin

Executor plugin interfacing Covalent with Slurm
https://covalent.xyz
Apache License 2.0
27 stars 6 forks source link

Unclear error reported in the UI when the results pkl is not found on the Covalent side #94

Open Andrew-S-Rosen opened 7 months ago

Andrew-S-Rosen commented 7 months ago

Environment

What is happening?

image

When a Covalent-based error happens where the results pkl can't be found, the UI is very unclear about what the problem is. Ideally, I would like to see more details so I can know where to debug things.

How can we reproduce the issue?

import covalent as ct

executor = ct.executor.SlurmExecutor(
    username="rosen",
    address="perlmutter-p1.nersc.gov",
    ssh_key_file="/home/rosen/.ssh/nersc",
    cert_file="/home/rosen/.ssh/nersc-cert.pub",
    conda_env="covalent",
    options={
        "nodes": 1,
        "qos": "debug",
        "constraint": "cpu",
        "account": "matgen",
        "job-name": "test",
        "time": "00:10:00",
    },
    remote_workdir="/pscratch/sd/r/rosen/test",
    create_unique_workdir=True,
    cleanup=False,
)

@ct.lattice(executor=executor)
@ct.electron
def workflow():
    import os

    os.chdir("../")
    return os.getcwd()

ct.dispatch(workflow)()

What should happen?

The UI should give me more information about the issue. The log was not very helpful either: covalent_ui.log.txt

Any suggestions?

Yes, two things should be done.

  1. The type of the exception should be reported. In this case, it seems a FileNotFoundError was raised, but this is never shown in the UI. It just says error. If I had known it was a FileNotFoundError, I would have known what the issue was more quickly.
  2. The traceback should be provided somewhere. Currently, it's nowhere to be found.