aws / aws-sdk-go-v2

AWS SDK for the Go programming language.
https://aws.github.io/aws-sdk-go-v2/docs/
Apache License 2.0
2.67k stars 644 forks source link

S3 client returns an incorrect file when using GEtObject #1403

Closed lorenzo-desantis-imagicle closed 3 years ago

lorenzo-desantis-imagicle commented 3 years ago

Confirm by changing [ ] to [x] below to ensure that it's a bug:

Describe the bug When reading bytes from the io.ReadCloser returned as Body field in a GetObjectOutput from a GetObject call, they differ from the file actually stored on S3. Performing the same call for the same file on different clients (AWS CLI, boto3) returns the correct file, instead. In particular, 27 heading bytes are missing with respect to the original file.

Version of AWS SDK for Go? v1.11.1

Version of Go (go version)? 1.15

To Reproduce (observed behavior)

  1. upload a file to a S3 bucket
  2. make a GetObject call for the given file key on the given bucket
  3. read bytes from the io.ReadCloser returned as Body field in a GetObjectOutput from the GetObject call
  4. compare such bytes with file bytes

Expected behavior Bytes are expected to be the same.

Additional context The given file is a MP3 file with tags encoded with ID3v2.3 tags version.

skmcgrail commented 3 years ago

Hey @lorenzo-desantis-imagicle can you give a code example of how you are both uploading and downloading the file, and in addition can you provide details on how you know the files are different are you performing a byte per byte comparison or using a hash like sha256?

Additionally, are you uploading the object to the bucket using the Go SDK (S3 Client or S3 Transfer Manager), or are you using another SDK or CLI?

I am not able to replicate the issue you are seeing, and have attempted to reproduce the described issue in the following ways:

go.mod

module temp

go 1.17

require (
    github.com/aws/aws-sdk-go-v2 v1.9.0
    github.com/aws/aws-sdk-go-v2/config v1.8.0
    github.com/aws/aws-sdk-go-v2/service/s3 v1.15.0
)

require (
    github.com/aws/aws-sdk-go-v2/credentials v1.4.0 // indirect
    github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.5.0 // indirect
    github.com/aws/aws-sdk-go-v2/internal/ini v1.2.2 // indirect
    github.com/aws/aws-sdk-go-v2/service/internal/accept-encoding v1.3.0 // indirect
    github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.3.0 // indirect
    github.com/aws/aws-sdk-go-v2/service/internal/s3shared v1.7.0 // indirect
    github.com/aws/aws-sdk-go-v2/service/sso v1.4.0 // indirect
    github.com/aws/aws-sdk-go-v2/service/sts v1.7.0 // indirect
    github.com/aws/smithy-go v1.8.0 // indirect
)
lorenzo-desantis-imagicle commented 3 years ago

I used byte-per-byte comparison, after uploading the file with a PUT to a pre-signed URL and downloading the file using GO SDK. (Downloading the file from S3 console after the PUT results in an identical byte-per-byte file, when compared to original file)

go.mod

go 1.15

require (
    github.com/aws/aws-lambda-go v1.23.0
    github.com/aws/aws-sdk-go-v2 v1.7.1
    github.com/aws/aws-sdk-go-v2/config v1.2.0
    github.com/aws/aws-sdk-go-v2/internal/ini v1.0.0 // indirect
    github.com/aws/aws-sdk-go-v2/service/s3 v1.11.1
)

code

objectInput := s3.GetObjectInput{
  Bucket: aws.String(bucketName),
  Key: aws.String(key),
}
s3Obj, _ := client.GetObject(s.Context, &objectInput)
bytesRead, _ := io.ReadAll(s3Obj.Body)

(error management has been omitted here, errors were nil)

skmcgrail commented 3 years ago

I am still not able to reproduce this behavior, and have attempted to reproduce this issue using the go.mod version contents you have provided. Can you share with my the exact file bytes size of the file that you are encountering this issue with? Additionally what environments are you seeing this behavior on Windows, Linux, macOS, AWS Lambda environment etc. Additionally are you using a standard bucket, or are you accessing it via an S3 AccessPoint or VPC endpoint etc.

lorenzo-desantis-imagicle commented 3 years ago
  1. The file is a .mp3 file with tags encoded on ID3v2.3 format for a total of 1.222.547 bytes. (due to this behaviour, part of the tags are not readable as the very beginning of the file contains a code for tags version)
  2. The code is run inside a lambda on AWS Lambda environment.
  3. The bucket is accessed directly from the lambda ( which dynamically assumes read permissions on it through STS ) without any AccessPoint or VPC endpoint in the middle.
skmcgrail commented 3 years ago
  1. Are you able to reproduce this behavior with other files that are not MP3?
  2. Are the files you are uploading gzip compressed as well? Note that the V2 SDK does not decompress gzip objects by default. Potentially providing a screenshot of the S3 metadata from the console would be helpful here.
  3. Would you be able to test if this behavior is happening in the V1 SDK as well?