DistributedScience / Distributed-CellProfiler

Run encapsulated docker containers with CellProfiler in the Amazon Web Services infrastructure.
https://distributedscience.github.io/Distributed-CellProfiler/
Other
38 stars 24 forks source link

correct role handling and S3FS bypass #130

Closed ErinWeisbart closed 1 year ago

ErinWeisbart commented 2 years ago

Correct credential handling to enable AWS IAM role usage.

ErinWeisbart commented 2 years ago

Closes #132 by changing load_data_csv access to download. Closes #114.

ErinWeisbart commented 2 years ago

Noting that S3FS bypass appears to be working fine and I had several tests of all the files downloading correctly and then apparently stochastically one run threw: OSError: [Errno 30] Read-only file system: '/home/ubuntu/local_input/path/to/image/cdp2bioactives_a22_s4_w33236d730-2143-4738-bfac-685f917a717d.tif.2FE98647'

ErinWeisbart commented 2 years ago

Full S3FS bypass is working at this point. Still haven't finished necessary changes to run-worker.sh to make it mount S3 bucket using new credential handling.

ErinWeisbart commented 1 year ago

I have this branch running on a dataset now and it works (!), successfully completely bypassing S3FS and using a role. (I don't know if it mounts S3 using a role, but I think it will be helpful to consider this a complete unit and troubleshoot S3 mounting with a role if necessary in a separate PR).

ErinWeisbart commented 1 year ago

Thanks @bethac07 ! Sorry, this PR kinda turned into a chaos of commits in all the troubleshooting. I'm not honestly sure how a lot of stylistic changes ended up being mixed in, but I will certainly try harder to keep them to their own commit in the future.