cositools / cosi-data-challenge-2

COSI's second data challenge
Apache License 2.0
10 stars 1 forks source link

Python AWS commands #4

Closed israelmcmc closed 7 months ago

israelmcmc commented 8 months ago

I’ve been thinking about adding a cell on each Jupyter NB that downloads all the files needed to run the NB. I think it can be done with the awscli library (installed already by cosipy). I’ll look into that since it’s been a common problem reported by many people

ckarwin commented 8 months ago

That's a good idea. I think some of the notebooks already have this (e.g. the spectral fit and extended source fit NBs). Specifically, they have the wasabi terminal commands wrapped in an os.system command, but the right paths and files are there. Is this what you had in mind, or something else?

israelmcmc commented 8 months ago

Since I saw I could do import awscli I though it would provide an API to use Python code. But that doesn't seem to be the case, I think we'll have to system command like in those notebooks, we just have to add it to the other ones.

Maybe we can add a common wrapper function like this:

import subprocess, os

def download_wasabi_file(file, output = None):

    if output is None:
        output = file.split('/')[-1]

    subprocess.run('aws', 's3api', 'get-object', '--bucket', 'cosi-pipeline-public', '--key', file, '--endpoint-url=https://s3.us-west-1.wasabisys.com', output], 
  env = os.environ.copy() | {'AWS_ACCESS_KEY_ID':'GBAL6XATQZNRV3GFH9Y4', 'AWS_SECRET_ACCESS_KEY':'GToOczY5hGX3sketNO2fUwiq4DJoewzIgvTCHoOv'})

I just tried to test it but Wasabi doesn't seem responsive right now :(

ckarwin commented 8 months ago

That's not a bad idea. Yeah, wasabi can be spotty sometimes.

avalluvan commented 8 months ago

A suggestion relevant to this discussion: could you add the following to the tutorial notebooks

  1. the size of the file being retrieved as a comment - helps plan out time-consuming downloads vs those on the fly
  2. an if-clause to avoid redownloading a file that already exists at the specified data path (unless manually overriden) - a mis-click can sometimes result in a ~20 min wait time
ckarwin commented 8 months ago

Thanks for the suggestions, @avalluvan.

Yes, we can add the file size.

Sure, the if statement can be included in the wasabi download wrapper function.

israelmcmc commented 8 months ago

I added the function here: https://github.com/cositools/cosipy/pull/141

@ckarwin Can you please review and merge?

ckarwin commented 8 months ago

Great, thanks @israelmcmc! I tested it and left some comments in the PR.

ckarwin commented 8 months ago

The new utility function has been merged into the main branch. I'll wait until all the notebooks have been updated before closing this issue.

israelmcmc commented 8 months ago

Sounds good. Thanks, @ckarwin

ckarwin commented 7 months ago

This issue can be closed now. All notebooks have been updates.