DataBiosphere / toil

A scalable, efficient, cross-platform (Linux/macOS) and easy-to-use workflow engine in pure Python.
http://toil.ucsc-cgl.org/.
Apache License 2.0
894 stars 241 forks source link

s3cmd get returns permission denied #1054

Closed dyndna closed 8 years ago

dyndna commented 8 years ago

First of all, impressive work and +1 for open-source code!

I have tried to download a recomputed TCGA samples from a recent biorxiv preprint using

s3cmd get --requester-pays \
  s3://cgl-rnaseq-recompute-fixed/tcga/014ee344-2844-4bc4-842e-e6d6e1e8fd9b.tar.gz

It returned error 403

download: 's3://cgl-rnaseq-recompute-fixed/tcga/014ee344-2844-4bc4-842e-e6d6e1e8fd9b.tar.gz' -> './014ee344-2844-4bc4-842e-e6d6e1e8fd9b.tar.gz'  [1 of 1]
download: 's3://cgl-rnaseq-recompute-fixed/tcga/014ee344-2844-4bc4-842e-e6d6e1e8fd9b.tar.gz' -> './014ee344-2844-4bc4-842e-e6d6e1e8fd9b.tar.gz'  [1 of 1]
ERROR: S3 error: 403 (Forbidden)

Using s3cmd version 1.6.1 on OS X 10.11.4 x86_64, Darwin Kernel Version 15.4.0

Thanks, Samir

jvivian commented 8 years ago

Hi @dyndna — Thanks for checking out the paper!

I just tried the sample you're looking at with a few different credentials and I think the reason you're getting 403 forbidden is that s3cmd hasn't been configured. Type s3cmd --configure and supply your AWS credentials in order to download the data. If you don't have an AWS account, you can view / download the data from the Xena browser: https://genome-cancer.soe.ucsc.edu/proj/site/xena/datapages/?host=https://toil.xenahubs.net

Please let me know if this resolves your issue.

dyndna commented 8 years ago

Not really! I have reconfigured s3cmd and tested on my s3 buckets to make sure I can do ls, get and put operations. Here is my ~/.s3cfg How can I get GTEx RSEM and/or Kallisto count matrices from http://xena.ucsc.edu/?

Thanks, Samir

[default]
access_key = XXX
access_token = 
add_encoding_exts = 
add_headers = 
bucket_location = US
ca_certs_file = 
cache_file = 
check_ssl_certificate = True
cloudfront_host = cloudfront.amazonaws.com
default_mime_type = binary/octet-stream
delay_updates = False
delete_after = False
delete_after_fetch = False
delete_removed = False
dry_run = False
enable_multipart = True
encoding = UTF-8
encrypt = False
expiry_date = 
expiry_days = 
expiry_prefix = 
follow_symlinks = False
force = False
get_continue = False
gpg_command = /usr/bin/gpg
gpg_decrypt = %(gpg_command)s -d --verbose --no-use-agent --batch --yes --passphrase-fd %(passphrase_fd)s -o %(output_file)s %(input_file)s
gpg_encrypt = %(gpg_command)s -c --cipher-algo AES256 --force-mdc --verbose --no-use-agent --batch --yes --passphrase-fd %(passphrase_fd)s -o %(output_file)s %(input_file)s
gpg_passphrase = XXX
guess_mime_type = True
host_base = s3.amazonaws.com
host_bucket = %(bucket)s.s3.amazonaws.com
human_readable_sizes = False
ignore_failed_copy = False
invalidate_default_index_on_cf = False
invalidate_default_index_root_on_cf = True
invalidate_on_cf = False
list_md5 = False
log_target_prefix = 
max_delete = -1
mime_type = 
multipart_chunk_size_mb = 15
preserve_attrs = True
progress_meter = True
proxy_host = 
proxy_port = 0
put_continue = False
recursive = False
recv_chunk = 4096
reduced_redundancy = False
restore_days = 1
secret_key = XXX
send_chunk = 4096
server_side_encryption = False
signature_v2 = False
simpledb_host = sdb.amazonaws.com
skip_existing = False
socket_timeout = 300
urlencoding_mode = normal
use_https = True
use_mime_magic = True
verbosity = WARNING
website_endpoint = http://%(bucket)s.s3-website-%(location)s.amazonaws.com/
website_error = 
website_index = index.html
jvivian commented 8 years ago

Hm, your s3cfg looks fine so I'm unsure what the issue is. I just helped someone outside our group download the data from S3 and it worked for them. I tried s3cmd get --requester-pays s3://cgl-rnaseq-recompute-fixed/tcga/014ee344-2844-4bc4-842e-e6d6e1e8fd9b.tar.gz using credentials from the account that owns the S3 bucket and using credentials from a different account and both started the download. I just checked the bucket policy and it's set to open with the requester-pays option enabled.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AddPerm",
            "Effect": "Allow",
            "Principal": "*",
            "Action": "s3:GetObject",
            "Resource": "arn:aws:s3:::cgl-rnaseq-recompute-fixed/*"
        },
        {
            "Sid": "AddPerm",
            "Effect": "Allow",
            "Principal": "*",
            "Action": "s3:ListBucket",
            "Resource": "arn:aws:s3:::cgl-rnaseq-recompute-fixed"
        }
    ]
}

Here's the GTEx Xena page: https://genome-cancer.soe.ucsc.edu/proj/site/xena/datapages/?cohort=GTEX

You can download the raw data from Xena or visualize it directly in the Xena browser.

If you'd like, you can provide me with limited credentials (in an email) from your account and I can try using those to troubleshoot.

dyndna commented 8 years ago

limited credentials - got it!

I was using secondary and limited access key from my AWS IAM panel which was authorized to access billing part. Download works with primary key having full access to billing.

Cheers and appreciate reply over weekend:-)

Samir