basho / riak_cs

Riak CS is simple, available cloud storage built on Riak.
http://docs.basho.com/riakcs/latest/
Apache License 2.0
566 stars 95 forks source link

477 in multipart upload from aws-sdk-go [JIRA: RCS-363] #1314

Open cbuben opened 8 years ago

cbuben commented 8 years ago

Summary

Riak CS version 2.1.1.

This is a new variant of #490.

Multipart uploads to Riak CS from aws-sdk-go based clients fail with a 477 HTTP response. It appears that Riak CS cannot handle aws-sdk-go's chosen method of quoting etags in the CompleteMultipartUpload body.

490 involves a client using aws-sdk-ruby, which uses &quot to quote the etag values in the CompleteMultipartUpload body; this blew up the CompleteMultipartUpload processing.

In #490 a fix was made in 5cba5e6 to handle &quot specifically.

This new issue involves a client using aws-sdk-go, which uses " to quote the etag value, and causes a similar 477 failure as originally seen in #490.

Example CompleteMultipartUpload POST from aws-sdk-go:

POST /some-bucket/foo?uploadId=z0D6DrTJSC2DjrF9Xe5W2w%3D%3D HTTP/1.1
Host: x.x.x.x:8080
User-Agent: aws-sdk-go/1.0.2 (go1.6; linux; amd64) S3Manager
Content-Length: 427
Authorization: AWS xxxxxxxxxxxxxxxxxxxx:xxxxxxxxxxxxxxxxxxxxxxxxxxxx
x-amz-date: Tue, 24 May 2016 19:28:25 UTC
Accept-Encoding: gzip

<CompleteMultipartUpload><Part><ETag>&#34;5f363e0e58a95f06cbe9bbc662c5dfb6&#34;</ETag><PartNumber>1</PartNumber></Part><Part><ETag>&#34;5f363e0e58a95f06cbe9bbc662c5dfb6&#34;</ETag><PartNumber>2</PartNumber></Part><Part><ETag>&#34;5f363e0e58a95f06cbe9bbc662c5dfb6&#34;</ETag><PartNumber>3</PartNumber></Part><Part><ETag>&#34;b6d81b360a5672d80c27430f39153e2c&#34;</ETag><PartNumber>4</PartNumber></Part></CompleteMultipartUpload>

Reproduction

My context for this problem: CloudFoundry BOSH director using https://github.com/pivotal-golang/s3cli to upload large files to Riak CS.

Build and use https://github.com/pivotal-golang/s3cli to upload a > 5MB file to Riak CS.

$ cat s3cli-riak 
{
  "signature_version": "2",
  "bucket_name": "some-bucket",
  "use_ssl": false,
  "host": "x.x.x.x",
  "port": 8080,
  "ssl_verify_peer": true,
  "credentials_source": "static",
  "access_key_id": "xxxxxxxx",
  "secret_access_key": "xxxxxxxx"
}

$ du -h foo
16M foo

$ s3cli -c s3cli-riak put foo foo
2016/05/25 17:03:12 performing operation put: 477InternalServerError: 477 Internal Server Error
    upload id: bG-UxormTqOM2VSFBZUESg==
cbuben commented 8 years ago

FWIW - I'm not even suggesting this is a proper fix (hence no PR), but the following hack does work around the problem:

diff --git a/src/riak_cs_wm_object_upload_part.erl b/src/riak_cs_wm_object_upload_part.erl
index 174733c..332c4d9 100644
--- a/src/riak_cs_wm_object_upload_part.erl
+++ b/src/riak_cs_wm_object_upload_part.erl
@@ -195,7 +195,7 @@ content_types_accepted(RD, Ctx) ->

 parse_body(Body0) ->
     try
-        Body = re:replace(Body0, "&quot;", "", [global, {return, list}]),
+        Body = re:replace(Body0, "&quot;|&#34;", "", [global, {return, list}]),
         {ok, ParsedData} = riak_cs_xml:scan(Body),
         #xmlElement{name='CompleteMultipartUpload'} = ParsedData,
         Nums = [list_to_integer(T#xmlText.value) ||

I am NO erlang/riak/riak-cs/xmerl wizard by any means, but some random observations:

The approach of preprocessing out these quotes prior to XML parsing seems really strange and brittle (this issue itself is an example of the brittleness). I'm not sure if the failure is due to 1) XML parsing not being able to deal with the quotes or 2) XML parsing is fine but the mere presence of the quotes in the etag values breaks later processing. So I'm not 100% clear on the intent of the preprocessing; that approach solves both 1 and 2, but I'm not sure which issue is causing the failure. If the issue is 1, then the question is "why isn't xmerl handling this valid XML?" If the issue is 2, postprocessing the etag values after XML processing seems apropos.

Thanks!