Parsl / parsl

Parsl - a Python parallel scripting library
http://parsl-project.org
Apache License 2.0
508 stars 195 forks source link

radical pilot sequential task throughput is very slow #2982

Closed benclifford closed 11 months ago

benclifford commented 1 year ago

Describe the bug In the pytest test suite, the test suite takes around 10 minutes to execute.

This is in comparison to around 11 seconds with the high throughput executor and 3 seconds with the local thread executor.

I haven't dug into the codebase but this is likely due to delays in task-end-to-next-task-starting that comes from some polling loop - that was the problem in htex and is a problem in the Work Queue executor (see #2801)

This is annoying in CI, but also could be a performance problem for end users running workflows which have a lot of "task-end-causes-next-task-to-launch" behaviour: for example, workflows with deep dependencies and short tasks; and workflows where the user has implemented their own steering mechanism (eg @WardLT 's work)

To Reproduce see CI

Expected behavior This should be "fast"

Environment CI

cc @AymenFJA @andre-merzky

AymenFJA commented 1 year ago

@benclifford thanks for raising this. Based on your explanation I used the same approach you used to debug in WQ here in RP.

I quote from the WQ ticket:

submit two tasks, one which depends on the other. observe a 1 second delay between task 1 completing on the worker and task 2 starting on the worker.

These are the numbers that I got when I submitted 2 tasks (Python functions) where task2 depends on task1 and i measure from the time task 1 was completed - the time task2 was launched from 2 endpoints:

  1. The runtime system workers side (RP)
  2. The RPEX executor.

This is my reproducer:

# Map function that returns double the input integer
@python_app
def app_double(x, parsl_resource_specification={}):
    return x * 2

@python_app
def app_sum(inputs):
    x = sum(inputs)
    return x

t1 = app_double(5, parsl_resource_specification={'ranks':1})
t2 = app_sum([t1.result()])

print(t2.result())
  1. The runtime system workers side (RP) Task1 completed and got the results:
    1701275858.578 : master.000000.worker.0000 : 14556 : 139647799764736 : DEBUG    : got   result: [{'type': 'task', 'tmgr': 'tmgr.0000', 'uid': 'task.000000'

Task2 is was launched on the worker:

1701275858.699 : master.000000.worker.0000 : 14570 : 140609763399424 : DEBUG    : applied: task.000001: 14665: dict_keys([14665])

Time difference:

1701275858.699 - 1701275858.578 = ~0.1209s
  1. The executor side (timestamps are reported by Parsl logger): Task1 completed and got the results:
    1701275858.608324 2023-11-29 11:37:38 MainProcess-11909 Thread-22-140435271960320 parsl.dataflow.dflow:552 _complete_task INFO: Task 0 completed (launched -> exec_done)

Task2 was launched on the worker:

1701275858.617444 2023-11-29 11:37:38 MainProcess-11909 MainThread-140437814228800 parsl.dataflow.dflow:728 launch_task INFO: Parsl task 1 try 0 launched on executor RPEX

Time difference:

1701275858.617444 - 1701275858.608324 = ~0.00911s

If my analysis is correct I do not think that this is the issue and we need to dig deeper.

AymenFJA commented 12 months ago

@benclifford this PR #2988 improves the performance in general and should address the issue of this ticket.