coiled / feedback

A place to provide Coiled feedback
14 stars 3 forks source link

Cluster has access to private bucket after restarting the python kernel #96

Closed FabioRosado closed 1 year ago

FabioRosado commented 3 years ago

Diederik Greveling reported this issue to me by DM on slack.

When setting up a cluster with:

import coiled
cluster = coiled.Cluster(name="test1", n_workers=10, credentials="local")

They have access to the private bucket, when they delete the cluster, restart the python kernel and create a new cluster with a different name, they still have access to the private bucket.

import coiled
cluster = coiled.Cluster(name="test2", n_workers=10, credentials=None)

Initially, I thought we had to set credentials as a string, but that just throws a ValueError: 'none' is not a valid CredentialsPreferred exception.

This private bucket is on Diederik's personal AIM account - no was account managed by us has access to the bucket.

I've asked Diederik to try to create a second cluster without the credentials arg just to see if that fixes the issue. Diederik might provide some test code for us to try as well

DPGrev commented 3 years ago

If we run with a new cluster, it seems that the default behaviour is to try and load the local aws credentials always.

import coiled
cluster = coiled.Cluster(name='aws-bucket-access', n_workers=10)

This still grants access to our private bucket.

In fact if we remove the aws credentials in ~/aws/credentials and run the same code as above but with a different cluster name (a new cluster) we get the following error:

NoCredentialsError: Unable to locate credentials

Thus we think that coiled is always trying to access aws credentials by default.

DPGrev commented 3 years ago

What we think is happening in the coiled.io client is that since CredentialsPreferres.ACCOUNT is enabled by default it will enter the following if statement on line 232 in cluster.py:

if self.credentials == CredentialsPreferred.ACCOUNT:
    aws_creds = await self.cloud.get_aws_credentials(self.account)
    # Setup the default session & environment variables so that
    # account creds are used for other AWS things (e.g. local Dask
    # client)
    boto3.setup_default_session(**aws_creds)
    for k, v in aws_creds.items():
        os.environ[k.upper()] = v

However since we have not setup any aws credentials in our coiled account aws_creds is empty and thus no AWS environment variables will be set resulting in aiobotocore to use the local credentials stored in ~/aws/credentials on line 243:

session = aiobotocore.get_session()

aiobotocore then generates temporary credentials which are distributed to the coiled daks scheduler.

Resulting in local credentials always being used by default.

A possible fix would by to stop generating the sts at line 240 when:

self.credentials == CredentialsPreferred.ACCOUNT and not aws_creds
shughes-uk commented 1 year ago

Using account level credentials like this was deprecated