Parsl / parsl

Parsl - a Python parallel scripting library
http://parsl-project.org
Apache License 2.0
506 stars 195 forks source link

`GlobusComputeExecutor` fails to handle malformed resource_specification #3620

Open yadudoc opened 1 month ago

yadudoc commented 1 month ago

Describe the bug

GlobugComputeExecutor raises a GlobusAPIError when a function with a malformed resource_specification is submitted. Subsequent tasks fail with a notification that the executor is shutdown and no new functions may be executed. Ideally, we should present clearer error messages that describe what failed.

To Reproduce

Run this test:

Use this branch for testing: https://github.com/Parsl/parsl/pull/3619

pytest --config parsl/tests/configs/globus_compute.py parsl/tests/test_error_handling/test_resource_spec.py::test_resource
yadudoc commented 1 month ago

It looks like the Globus Compute services validate and reject malformed resource_specification requests, and this is testable via the parsl/tests/test_error_handling/test_resource_spec.py::test_resource. The main concern is that this causes the GCExecutor from the compute SDK to enter a broken state, which would fail any subsequent functions submitted to it, breaking tests. Since this test would break the executor shared across tests, I'm leaning towards marking the test to be skipped when testing with GCE.

benclifford commented 1 month ago

Stepping back from Globus Compute and looking at the behaviour of this test in Parsl in general:

I think the situation thats failing, you can imagine as a real user working interactively and typoing their resource specification for a task, getting an error and fixing that typo.

The test in question (implicity) is testing: can I interactively typo a resource spec and have Parsl survive?

I think that's a good behaviour to have.

It sounds like Globus Compute is saying "you have to make your resource spec perfect else we'll shut down", which is a different approach to the user experience, but not one Parsl should be following. So perhaps the Parsl component which interfaces to GC should be checking that the resource spec really is perfect before submitting it into the Globus Compute SDK so that interactive users don't have to restart Parsl when they typo something.

(Or if we are lucky, perhaps Globus Compute might reconsider their user experience)