Parsl / parsl

Parsl - a Python parallel scripting library
http://parsl-project.org
Apache License 2.0
506 stars 195 forks source link

AWS EC2 Task Not Executing #1927

Open knagaitsev opened 3 years ago

knagaitsev commented 3 years ago

Describe the bug A simple task started on my laptop with a worker on an AWS EC2 instance never completes. There is just an endlessly repeating:

2021-01-05 13:23:09.876 parsl.dataflow.strategy:204 [DEBUG]  Executor ec2_single_node has 1 active tasks, 1/0 running/pending blocks, and 0 connected workers
2021-01-05 13:23:14.870 parsl.dataflow.task_status_poller:62 [DEBUG]  Polling

To Reproduce I originally ran into the problem running with Parsl master locally, and Parsl 1.0.0 on EC2, so I thought this might be the issue: https://github.com/Parsl/parsl/issues/1137, but switching to Parsl 1.0.0 locally still got me the same issue. Here is the setup:

import parsl
import os
from parsl.app.app import python_app, bash_app
from parsl.data_provider.files import File
from aws_config import config

parsl.load(config)
@bash_app
def get_wd(outputs=[]):
    return 'pwd > {}'.format(outputs[0])

wd = get_wd(outputs=[File('/tmp/out.txt')])
with open(wd.outputs[0].result(), 'r') as f:
    print(f.read())

aws_config.py:

from parsl.config import Config
from parsl.providers import AWSProvider
from parsl.executors import HighThroughputExecutor

config = Config(
    executors=[
        HighThroughputExecutor(
            label='ec2_single_node',
            provider=AWSProvider(
                # Specify your EC2 AMI id
                'ami-0885b1f6bd170450c',
                region='us-east-1',
                key_name='key2',
                profile="default",
                # state_file='awsproviderstate.json',
                nodes_per_block=1,
                init_blocks=1,
                max_blocks=1,
                min_blocks=0,
                walltime='00:05:00',
                instance_type='t2.nano'
            ),
        )
    ],
)

Expected behavior I haven't successfully run any remote executions like this yet, but I believe it should complete execution on the EC2 worker, shut down the EC2, then send back the result written to /tmp/out.txt. Have I written my bash_app and output retrieval correctly, or do I need to adjust it to work with a remote worker?

I'm willing to work on this issue and also look into improving https://github.com/Parsl/parsl/issues/1137. So far I've found that everything within here works, because I've confirmed that Parsl 1.0.0 installs successfully on the EC2:

https://github.com/Parsl/parsl/blob/b947adb55fb310d2e7db14f7a00bfdcb4cb04cdc/parsl/providers/aws/template.py#L1-L6

So it seems that something with process_worker_pool.py is not working.

Environment

Distributed Environment

benclifford commented 3 years ago

The most common problem I've seen with EC2 from a laptop is that the laptop needs an unfirewalled public IP address, which is not how most people's laptops are connected to the network. Is your laptop-local network configured that way?

knagaitsev commented 3 years ago

It is not, but I figured that might be an issue. Is there any interest in removing this limitation? I'm not sure how difficult that would be given the current implementation.

knagaitsev commented 3 years ago

@benclifford I had a successful test run after running the Parsl script from another EC2 instance with all inbound and outbound ports open :+1: . I'm still unsure which inbound/outbound ports must be open where the Parsl script is run, and if that isn't configurable, maybe it should be?

kylechard commented 3 years ago

@Loonride you can set the ports to be used in the executor config. The defaults for the HighThroughputExecutor are as follows:

             worker_port_range: Optional[Tuple[int, int]] = (54000, 55000)
             interchange_port_range: Optional[Tuple[int, int]] = (55000, 56000)

In your case you'll want to open up 55000-56000 on the EC2 instance with the Parsl script.