cositools / cosipy

The COSI high-level data analysis tools
Apache License 2.0
3 stars 16 forks source link

Add fetch_wasabi_file() utility function. #141

Closed israelmcmc closed 4 months ago

israelmcmc commented 4 months ago

Includes a unit test.

Usage:

from cosipy.util import fetch_wasabi_file

fetch_wasabi_file('test_file.txt', override = True)

test_file.txt is an actual file I added to the public wasabi folder in order to test this with a small file.

ckarwin commented 4 months ago

Excellent, thanks @israelmcmc!

The code works well from the command line. However, I still get the following error when trying to run in my Jupyter Notebook:

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
Cell In [2], line 1
----> 1 fetch_wasabi_file('ComptonSphere/mini-DC2/GalacticScan.inc1.id1.crab2hr.extracted.tra.gz', override = True)

File /zfs/astrohe/ckarwin/COSI/COSIpy_Development/AWS_PR/cosipy/cosipy/util/data_fetching.py:17, in fetch_wasabi_file(file, output, override, bucket, endpoint, access_key_id, access_key)
     14 if os.path.exists(output) and not override:
     15     raise RuntimeError(f"File {output} already exists.")
---> 17 subprocess.run(['aws', 's3api', 'get-object',
     18                 '--bucket', bucket,
     19                 '--key', file,
     20                 '--endpoint-url', endpoint,
     21                 output], 
     22                env = os.environ.copy() | {'AWS_ACCESS_KEY_ID':access_key_id,
     23                                           'AWS_SECRET_ACCESS_KEY':access_key})

File /zfs/astrohe/Software/COSIMain_u2/lib/python3.9/subprocess.py:505, in run(input, capture_output, timeout, check, *popenargs, **kwargs)
    502     kwargs['stdout'] = PIPE
    503     kwargs['stderr'] = PIPE
--> 505 with Popen(*popenargs, **kwargs) as process:
    506     try:
    507         stdout, stderr = process.communicate(input, timeout=timeout)

File /zfs/astrohe/Software/COSIMain_u2/lib/python3.9/subprocess.py:951, in Popen.__init__(self, args, bufsize, executable, stdin, stdout, stderr, preexec_fn, close_fds, shell, cwd, env, universal_newlines, startupinfo, creationflags, restore_signals, start_new_session, pass_fds, user, group, extra_groups, encoding, errors, text, umask)
    947         if self.text_mode:
    948             self.stderr = io.TextIOWrapper(self.stderr,
    949                     encoding=encoding, errors=errors)
--> 951     self._execute_child(args, executable, preexec_fn, close_fds,
    952                         pass_fds, cwd, env,
    953                         startupinfo, creationflags, shell,
    954                         p2cread, p2cwrite,
    955                         c2pread, c2pwrite,
    956                         errread, errwrite,
    957                         restore_signals,
    958                         gid, gids, uid, umask,
    959                         start_new_session)
    960 except:
    961     # Cleanup if the child failed starting.
    962     for f in filter(None, (self.stdin, self.stdout, self.stderr)):

File /zfs/astrohe/Software/COSIMain_u2/lib/python3.9/subprocess.py:1821, in Popen._execute_child(self, args, executable, preexec_fn, close_fds, pass_fds, cwd, env, startupinfo, creationflags, shell, p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite, restore_signals, gid, gids, uid, umask, start_new_session)
   1819     if errno_num != 0:
   1820         err_msg = os.strerror(errno_num)
-> 1821     raise child_exception_type(errno_num, err_msg, err_filename)
   1822 raise child_exception_type(err_msg)

FileNotFoundError: [Errno 2] No such file or directory: 'aws'

I have a similar problem when trying to run:

import os
os.system("AWS_ACCESS_KEY_ID=GBAL6XATQZNRV3GFH9Y4 AWS_SECRET_ACCESS_KEY=GToOczY5hGX3sketNO2fUwiq4DJoewzIgvTCHoOv aws s3api get-object  --bucket cosi-pipeline-public --key ComptonSphere/mini-DC2/GalacticScan.inc1.id1.crab2hr.extracted.tra.gz --endpoint-url=https://s3.us-west-1.wasabisys.com GalacticScan.inc1.id1.crab2hr.extracted.tra.gz")

Error: sh: aws: command not found

I think @fieldrog and @saurabhmittal23 mentioned that they had a similar issue, and needed to use the install instructions from the aws page: https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html.

One additional comment: Can you please add documentation for this new method (i.e. doc string). As part of this, please make clear that the passed file needs to be the full wasabi path.

israelmcmc commented 4 months ago

@ckarwin Can you try again, please? I think the last change should fix this. I realized awscli does have an underlying python API, but it was not documented. I also added the documentation, thanks for noticing that.

ckarwin commented 4 months ago

Awesome @israelmcmc, it works now. Please double check the doc string format and let me know if it's ready to be merged.

israelmcmc commented 4 months ago

I added the space before :, but it doesn't seem to have any impact. This however made me realize that I hadn't added this function to the sphinx. This is fixed in the last commit. I think it's ready to merge.