dask / distributed

A distributed task scheduler for Dask
https://distributed.dask.org
BSD 3-Clause "New" or "Revised" License
1.56k stars 714 forks source link

SpecCluster implmentation closes the scheduler unexpectedly #7025

Open BadAsstronaut opened 1 year ago

BadAsstronaut commented 1 year ago

See https://github.com/dask/dask-cloudprovider/issues/375

What happened: I want to have a scheduler that stays alive and adaptive workers scale-out as needed. This is working well, except when an adaptive scaling process needs to close. Calling cluster.close() results in the remote scheduler getting terminated via a terminate command via the scheduler_comm. See referenced issue for details.

What you expected to happen: A 'worker cluster' should be able to close independently of the scheduler cluster.

Minimal Complete Verifiable Example:

def main():
    async def run():
        logger.info('-' * 47)
        cluster = FargateCluster(asynchronous=True,
                                 image=image,
                                 fargate_spot=True,
                                 scheduler_address=scheduler_address,
                                 cluster_arn=cluster_arn,
                                 execution_role_arn=execution_role_arn,
                                 task_role_arn=task_role_arn,
                                 cloudwatch_logs_group=cloudwatch_logs_group,
                                 vpc=vpc,
                                 subnets=subnets,
                                 security_groups=security_groups,
                                 fargate_use_private_ip=True)
        cluster.adapt(minimum=1, maximum=5)
        await cluster._start()
        logger.info('-' * 47)
        await asyncio.sleep(120)
        await cluster.close()
    try:
        asyncio.run(run())
    finally:
        logger.info('Adaptive scaler has ended')
BadAsstronaut commented 1 year ago

@jacobtomlinson Opened this issue per your suggestion. Thanks for your help.