Closed csoulette closed 3 years ago
Hi @csoulette are you trying to upload the files as part of your workflow? Tibanna handles file uploads and downloads and it may not work if the workflows themselves try to handle uploads/downloads. Is that the case, or do you mean whether tibanna can take in a parameter to use encrypted uploading?
Hi SooLee,
Thanks for the quick response.
I'm referring to your latter statement. The only files that are being uploaded to S3 bucket are the snakemake dependencies (such as the snakefile itself), and files created after each step in the workflow. It is my understanding that when snakemake attempts to upload anything to S3 that it uses some core function of tibanna to do so, so all the uploading i'm doing should be through tibanna function (hope that makes sense). I'll include the entire stack trace for error in which i'm inferring this from:
Traceback (most recent call last):
File "/home/csoulette/anaconda3/envs/smk-tib/lib/python3.9/site-packages/snakemake/__init__.py", line 694, in snakemake
success = workflow.execute(
File "/home/csoulette/anaconda3/envs/smk-tib/lib/python3.9/site-packages/snakemake/workflow.py", line 1017, in execute
success = scheduler.schedule()
File "/home/csoulette/anaconda3/envs/smk-tib/lib/python3.9/site-packages/snakemake/scheduler.py", line 488, in schedule
self.run(runjobs)
File "/home/csoulette/anaconda3/envs/smk-tib/lib/python3.9/site-packages/snakemake/scheduler.py", line 499, in run
executor.run_jobs(
File "/home/csoulette/anaconda3/envs/smk-tib/lib/python3.9/site-packages/snakemake/executors/__init__.py", line 136, in run_jobs
self.run(
File "/home/csoulette/anaconda3/envs/smk-tib/lib/python3.9/site-packages/snakemake/executors/__init__.py", line 2142, in run
exec_info = API().run_workflow(
File "/home/csoulette/anaconda3/envs/smk-tib/lib/python3.9/site-packages/tibanna/core.py", line 188, in run_workflow
upload_workflow_to_s3(unicorn_input)
File "/home/csoulette/anaconda3/envs/smk-tib/lib/python3.9/site-packages/tibanna/ec2_utils.py", line 916, in upload_workflow_to_s3
boto3.client('s3').upload_file(source, bucket, target)
File "/home/csoulette/anaconda3/envs/smk-tib/lib/python3.9/site-packages/boto3/s3/inject.py", line 129, in upload_file
return transfer.upload_file(
File "/home/csoulette/anaconda3/envs/smk-tib/lib/python3.9/site-packages/boto3/s3/transfer.py", line 285, in upload_file
raise S3UploadFailedError(
boto3.exceptions.S3UploadFailedError: Failed to upload /home/csoulette/projects/sandbox/Snakefile to aws-test-bucket-cs/M78uCI14xwGJ.workflow/Snakefile: An error occurred (AccessDenied) when calling the PutObject operation: Access Denied.
** So the question is whether tibanna can take in a parameter to use encrypted uploading.
-CMS
I see. It looks like a permission problem. Have you set up the buckets when you deployed tibanna to AWS?
Yes. When deploying the unicorn I used the bucket argument so that tibanna has the correct permission to write to the bucket
I was actually able to resolve the problem, sorry if my initial message was unclear. ** I've been able to rerun my snakemake workflow and successfully upload my Snakefile when running my snakemake workflow.
The buckets were setup so that each file needs to be uploaded using aws:kms encryption. If you try to upload a file without that encryption flag using awscli (--sse aws:kms), then the upload will fail.
I've change the source code for boto3 so that ALL uploads to s3 bucket using "awk:kms" encryption by adding a line in the boto3 script transfer.py
.
Changing boto3 source code is not ideal, and it would be better if I can simply pass the encryption argument to tibanna (which is using this boto3 script to handle s3 uploads) instead.
** I've looked into tibanna configs, but didn't see any json headers/tags that look like they could be used to achieve this.
Ah I see. Thanks for the clarification. Would this work?
"use_s3_encryption" : True
in input json config
which will add --sse aws:kms
to s3 upload when uploading any file to S3 (this will add it to every upload regardless of the bucket)Would it make sense to apply it also to downloading files from s3?
I can go ahead and try this and let you know. I actually am not sure about the download - it's a new bucket and haven't downloaded from it yet -- I would assume I would need it for both up&down.
I haven't used config files with tibanna yet, so I just want to clarify what the json would look like. I'm assuming like so:
{
"config": {
"run_name": "upload-test",
"use_s3_encryption": True
}
}
Thanks!!
oh no sorry I just saw the message - sorry for not being clear - I meant I could implement it but wanted to check with you to make sure that's what you wanted. If you're not sure about downloading, I will make the two options separate for now: e.g. "encrypt_s3_upload"
instead of "use_s3_encryption"
.
Hi SooLee,
Thanks for the clarification! This sounds great.
Somewhat related: I saw that users can specify which bucket to write tibanna log files to. I didn't see an option to specify a subdirectory within a bucket to write such files to. If users want to write logs to a specific folder on S3 bucket, will it work to simply append the folder name to "log_bucket" ?
Thanks!!
-CMS
Hi @csoulette Can you try 1.1.0? You'll have to redeploy tibanna unicorn (either clean up and redeploy or deploy a completely new one). I only added upload encryption (not download). Let me know if this works. The folder use in log bucket is something I've been thinking about but it's not there yet.
Thanks for adding this!
I actually just updated from 0.18.3 to 1.1.2 before testing 1.1.0. I ran into issue that was actually already described here -> https://stackoverflow.com/questions/65927246/snakemake-and-tibanna-cant-find-field-snakemake-main-filename and may be related to issue #256. I think I need to overcome this issue before being able to run version 1.1.0. This might be an issue with snakmake creating/passing the json to Tibanna?
...
File "/Users/ernestmordret/opt/anaconda3/envs/snakemake/lib/python3.9/site-packages/tibanna/ec2_utils.py", line 167, in fill_default
raise MissingFieldInInputJsonException(errmsg_template % ('snakemake_main_filename', self.language))
tibanna.exceptions.MissingFieldInInputJsonException: field snakemake_main_filename is required in args for language snakemake
I've tried adding "snakemake_main_filename" as a configfile json for snakemake, and also passing the argument as a --tibanna-config param, but neither seemed to help. Am I missing something?
-CMS
@csoulette This issue should be fixed in Tibanna v1.2.0.
Hello,
I'm writing to figure out if encrypted file upload is supported using tibanna configurations.
My setup: I'm running a snakemake workflow from my local machine. When running the workflow some of the snakemake files are uploaded directly to my S3 bucket, and others are uploaded after completing certain steps in my workflow. The file uploads must be done using kms:aks. When initially launching my snakemake workflow, I run into the following error:
According to the error, the problem is with the upload function from boto3 package. I adjusted transfer.py from boto3 package to include an extra line to add the encryption as an extra arg. Specifically, boto3 has a S3 bucket class, within the class there is an upload_file function that tibanna is presumably using, and the extra bit of code I added was like so:
extra_args={'ServerSideEncryption': 'aws:kms'} # cam
I went the route of adjusting boto3 first since it was quicker for me to figure out how to hack the upload function rather figure out if tibanna has functionality to pass such an argument along (still new to tibanna). This route is not ideal for obvious reasons, and so i'm hoping to figure out the tibann-ic way to achieve this.
Let me know if I can include any additional info. thanks!
-CMS