Access environment variables #4

Closed RichardScottOZ closed 4 months ago

RichardScottOZ commented 8 months ago

In addition to to this setting, AWS credentials with appropriate write access to your target S3 bucket should be available in your environment. - so is this saying they have to be environment variables?

jhamman commented 8 months ago

Hi @RichardScottOZ - right now, we’re fairly “hands off” as to how you configure your AWS credentials. This will likely change in the medium term but for now, you have a number of options, such as: AWS environment variables

The arraylake client uses AioBotoCore (similar to boto3) and picks up credentials in the same way.

Is there another way you were hoping to configure access to S3?

RichardScottOZ commented 8 months ago

Just wondering about possibilities

e.g. if I have profile [default] profile [boringothercompayone] profile [arraylake]

that sort of thing

so if doing notebook type tests would pass it some config type setup read from config as such - can you do similar with AioBotoCore?

RichardScottOZ commented 8 months ago

This sort of thing

def get_aws_credentials():
    parser = configparser.RawConfigParser()'~/.aws/config'))'C:\Users\rscott\.aws\config')
    #config = parser.items('default')'~/.aws/credentials'))
    credentials = parser.items('default')
    #all_credentials = {key.upper(): value for key, value in [*config, *credentials]}
    all_credentials = {key.upper(): value for key, value in [*credentials]}
    with contextlib.suppress(KeyError):
        all_credentials["AWS_REGION"] = all_credentials.pop("REGION")
    return all_credentials

creds = get_aws_credentials()


s3 = s3fs.S3FileSystem(anon=False, key=access_key, secret=secret_key, client_kwargs=client_kwargs)
store = s3fs.S3Map(root=s3_path, s3=s3, check=False)
modeldata = xr.open_zarr(store=store, mask_and_scale=True)
RichardScottOZ commented 8 months ago

so having set explicitly environment variables for access at the start and using the emailed example

client = al.Client()

repo_name = "OZ-Minerals/test"

# Open your data using Xarray
# ds = xr.open_dataset(...)
# or use the tutorial dataset (requires the 'pooch' package)
ds = xr.tutorial.open_dataset("air_temperature")
# Open your existing repository
repo = client.get_or_create_repo(repo_name)
# Write your dataset to Arraylake
ds.to_zarr(, group="mygroup", zarr_version=3, mode="w")
#ds.to_zarr(, group="mygroup", mode="w")
# Make your first commit.
repo.commit("my first commit!")

ValueError: [{'loc': ['query', 'base_commit'], 'msg': 'field required', 'type': 'value_error.missing'}]
RichardScottOZ commented 8 months ago

note here that test already existed - I made it via the web a week ago

RichardScottOZ commented 8 months ago

same error if I make it 'test2' however

jhamman commented 8 months ago

@RichardScottOZ - can you post the output from arraylake --diagnostics? This looks like a version issue.

To your prior point about setting s3 connection parameters: There are a few things you can do:

  1. Use the arraylake config:

    from arraylake import Client, config
    config.set({"s3": {"aws_secret_access_key": ..., "aws_access_key_id": ..., "region_id": ...})
    client = Client()
  2. You can also configure aws to use a different profile using environment variables or using boto3. This might look like:

    os.environ['AWS_PROFILE'] = 'default'
    # or
    client = Client()

    See this SO post for more details.

RichardScottOZ commented 8 months ago
jhamman commented 8 months ago

@RichardScottOZ - it looks like conda is not finding a recent version of arraylake for your environment. Would you mind trying pip:

pip install arraylake

If that is not an option, try:

conda install -v arraylake
RichardScottOZ commented 8 months ago

that won't make any difference if no uvloop for windows will it - or have you packaged something? happy to try whatever though

RichardScottOZ commented 8 months ago

I likely won't get a chance to try on linux until tomorrow

RichardScottOZ commented 8 months ago
info     libmamba Problem count: 1
Could not solve for environment specs
Encountered problems while solving:
  - nothing provides __unix needed by arraylake-0.7.2-pyhd8ed1ab_0

The environment can't be solved, aborting the operation

info     libmamba Freeing solver.
info     libmamba Freeing pool.
jhamman commented 8 months ago

uvloop is an optional dependency of arraylake. It shouldn't be showing up as a dependency on windows at all.

It seems like our conda-forge configuration isn't quite right. We can address this tomorrow. In the meantime, I suggest trying out pip.

RichardScottOZ commented 8 months ago

when I tried yesterday

I will give the pip suggestion a go now re: arraylake

RichardScottOZ commented 8 months ago

Ok so that installed, anyway.

C:\Users\rnmsc\anaconda3\envs\pyvistaxarray\lib\site-packages\arraylake\ UserWarning: Migrated C:\Users\rnmsc\.config\arraylake_client\config.yaml to C:\Users\rnmsc\.config\arraylake\config.yaml. 


        'python': '3.9.18 | packaged by conda-forge | (main, Aug 30 2023, 
03:40:31) [MSC v.1929 64 bit (AMD64)]',
        'python-bits': '64',
        'OS': 'Windows',
        'OS-release': '10',
        'machine': 'AMD64',
        'processor': 'Intel64 Family 6 Model 165 Stepping 2, GenuineIntel',
        'byteorder': 'little',
        'LC_ALL': 'None',
        'LANG': 'None',
        'LOCALE': "('English_Australia', '1252')"
        'arraylake': '0.7.2',
        'aiobotocore': '2.7.0',
        'uvloop': 'none',
        'zarr': '2.16.0',
        'numcodecs': '0.12.1',
        'numpy': '1.26.0',
        'donfig': '0.8.1.post0',
        'pydantic': '1.10.13',
        'httpx': '0.25.0',
        'ruamel.yaml': '0.18.2',
        'typer': '0.9.0',
        'rich': 'installed',
        'fsspec': '2023.10.0',
        'kerchunk': 'none',
        'h5py': 'none',
        's3fs': 'none',
        'cachetools': '5.3.2',
        'structlog': '23.2.0',
        'ipytree': 'none',
        'xarray': '2023.10.1',
        'dateutil': '2.8.2',
        'click': '8.1.3',
        'dask': '2023.5.0',
        'distributed': '2023.5.0'
    config={'chunkstore.hash_method': 'hashlib.sha256'},
        'service_uri': '',
        'service_version': '0.7.1.post23.dev0+f29772d.dirty'
jhamman commented 8 months ago

Huzza! this all looks very healthy! You should be good to go now.

RichardScottOZ commented 8 months ago

One possible wrinkle - arraylake and arraylake_client both in environment now after the messing around - can that cause a problem?

arraylake                 0.7.2                    pypi_0    pypi
arraylake-client          0.6.0              pyhd8ed1ab_0    conda-forge
jhamman commented 8 months ago

This should be fine. Use arraylake from now on. 0.7.2 is the release we made on Friday and is the package name you should use going forward. If you like, you can uninstall the arraylake-client package but that is not strictly required.

And yes, your import should now be:

import arraylake
RichardScottOZ commented 8 months ago


RichardScottOZ commented 8 months ago

some of the previous tests actually created repos too it seemed - hadn't looked until now

RichardScottOZ commented 8 months ago

RichardScottOZ commented 8 months ago

So how would you explain that to the layperson who is not us?

dcherian commented 8 months ago

So how would you explain that to the layperson who is not us?

What would we be explaining? The object names in your S3 bucket?

dcherian commented 8 months ago

I think we have fixed our windows conda-forge build problem. Would you mind testing it out please?

RichardScottOZ commented 8 months ago

So how would you explain that to the layperson who is not us?

What would we be explaining? The object names in your S3 bucket?

Not mine specifically, but it would be fine as an example - as in when you do this you will expect to see X, this is why. More layperson than people in this conversation.

RichardScottOZ commented 8 months ago

I think we have fixed our windows conda-forge build problem. Would you mind testing it out please?

Sure, hopefully get to it shortly.