Support for node resource hints in provider

yadudoc commented 5 years ago

With IPP we had to explicitly specify the number of workers to be launched per node, since the launch system doesn't necessarily run on the compute node and lacked information about the available resource before hand. As we developed HTEX, we moved to an architecture where we might use a launch system that starts 1 manager per node, and this manager after probing available cpu/mem/ resources on the node launches workers based on per-worker resource limits set by the user. This only helps once we have a node provisioned and doesn't help in the strategy planning phase where resource information is still unavailable and we assume the worst case, only 1 worker can run per node, leading to gross overestimation of resources required. One potential automatic method would be to launch a single block and wait until resource information is available before attempting to scale further, but this might work poorly with slow queues. This would definitely take more effort, and is better done with the plan to revamp the strategy component.

When all else fails, we could fallback to the user to specify resource hints to the provider :

HighThroughputExecutor(
            label="htex",
            worker_debug=True,
            cores_per_worker=2,
            mem_per_worker=2,        # 2GB
            provider=SlurmProvider(
                'debug',
                nodes_per_block=2,
                cpus_per_node=64,         # <-- New
                mem_per_node=128,         # <-- New
            )
)

This is a MolSSI/QCArchive requirement (@Lnaden)

annawoodard commented 5 years ago

Note already have cpus_per_node in the PBSProProvider. Also note we probably want to keep #943 in mind as we implement this because they are closely related.

yadudoc commented 5 years ago

This is high priority for QCArchive.

yadudoc commented 5 years ago

The quickest way to go about this would be to specify exactly the resource slice you need from the scheduler, via Parsl's provider.scheduler_options kwarg. This would be followed by the provider.worker_init kwarg which would export a few global variables that Parsl can use. We need to add support for using these exported options in the Parsl worker code. I'll add a separate issue for that.

Here's a sample config that would work after we've got the bits implemented :

HighThroughputExecutor(
    label="htex",
    cores_per_worker=2,
    mem_per_worker=2,        # 2GB
    provider=SlurmProvider(
          'debug',
          nodes_per_block=2,
          scheduler_options = '#SBATCH --cpus-per-task=2 --mem-per-cpu=1g --ntasks=1',
          worker_init='export PARSL_MAX_MEMORY=2G; export PARSL_MAX_CPUS=2',
     )
)

@Lnaden, @dgasmith Could you take a look please ?

dgasmith commented 5 years ago

It would be good if the scheduler options where taken care of automatically by the general cores_per_worker. Without this things could get out of sync and wind up with weird errors. Something like dask-jobqueue handles it this way, I can dig up the templates if you want.

annawoodard commented 5 years ago

@Lnaden, @dgasmith

This has been implemented for the SlurmProvider, could you let us know if it addresses your concerns? The main change is that 1) you can request less than a node meaningfully now and 2) if you set mem_per_node and/or cores_per_node, then Parsl will use this information to calculate how many workers will fit on a node in order to make a more intelligent guess when scaling up about how many resources it needs (instead of assuming the worst-case scenario, that it will only be able to run one worker per node).

This hasn't made it into a release yet, but you can install the lastest with: pip install git+https://github.com/parsl/parsl

Here's a config I tested with:

from parsl.config import Config
from parsl.providers import SlurmProvider
from parsl.addresses import address_by_hostname
from parsl.executors import HighThroughputExecutor

config = Config(
    executors=[
        HighThroughputExecutor(
            cores_per_worker=1,
            mem_per_worker=3,
            address=address_by_hostname(),
            provider=SlurmProvider(
                'broadwl',
                nodes_per_block=1,
                init_blocks=1,
                min_blocks=1,
                max_blocks=1,
                mem_per_node=12,
                cores_per_node=4,
                exclusive=False
            ),
        )
    ],
)