aws / aws-sdk-go

AWS SDK for the Go programming language.
http://aws.amazon.com/sdk-for-go/
Apache License 2.0
8.62k stars 2.07k forks source link

s3 upload of jpeg results in empty file of 0 bytes ( not an obvious user issue as works on prior commit hash ) #1962

Closed peterabarry closed 5 years ago

peterabarry commented 6 years ago

Please fill out the sections below to help us address your issue.

Version of AWS SDK for Go?

Latest go get github.com/aws/aws-sdk-go/aws at time of posting.

WORKS IF I FIX VERSION TO git checkout 15a0ee49045ffdc43b742968e4046f9e788263c5 I have not had chance to binary search the versions since but pinned that version manually for now.

Version of Go (go version)?

go version go1.10.2 linux/amd64

What issue did you see?

0 bytes on s3 bucket for jpeg after upload.

Steps to reproduce

sess := session.New(&aws.Config{})
    uploader := s3manager.NewUploader(sess)
    _, err = uploader.Upload(&s3manager.UploadInput{
        ACL:                  aws.String("private"),
        ServerSideEncryption: aws.String("AES256"),
        Bucket:               aws.String(settings["s3Bucket"].(string)),
        Key:                  aws.String(vars["DocumentId"]),
        ContentType:          aws.String(mimetype),
        Body:                 file,
    })

If you have an runnable example, please include it.

jasdel commented 6 years ago

Thanks for reaching out to us @peterabarry could you clarify the Go type for the file variable? is this an os.File or some other type? When the file is uploaded is the ContentType of the file set correctly?

peterabarry commented 6 years ago
file, header, err := r.FormFile("file")
    .......

    validMimes := map[string]bool{
        "image/png":                                                               true,
        "image/jpeg":                                                              true,
        "image/pjpeg":                                                             true,
        "application/pdf":                                                         true,
        "application/msword":                                                      true,
        "application/vnd.openxmlformats-officedocument.wordprocessingml.document": true,
    }

    imageMimes := map[string]bool{
        "image/png":   true,
        "image/jpeg":  true,
        "image/pjpeg": true,
    }

    defer file.Close()

    fileMime := header.Header.Get("Content-Type")
    documentIsValid := false

    if validMimes[fileMime] {
        documentIsValid = true

        //Check that the image meets minimum resolution requirement
        if imageMimes[fileMime] {

            resolution, err := Helpers.GetImageResolution(file, fileMime)
........................
    if documentIsValid {

        settings := Helpers.GetSettings()

        sess := session.New(&aws.Config{})
        uploader := s3manager.NewUploader(sess)
        _, err = uploader.Upload(&s3manager.UploadInput{
            ACL:                  aws.String("private"),
            ServerSideEncryption: aws.String("AES256"),
            Bucket:               aws.String(settings["s3Bucket"].(string)),
            Key:                  aws.String(vars["DocumentId"]),
            ContentType:          aws.String(mimetype),
            Body:                 file,
        })

Yes MIME type likely to be correct and works on that old commit hash.

peterabarry commented 6 years ago

Also tested it with this borrowed code prior to posting:

func Sample(w http.ResponseWriter, r *http.Request) (string, error) {
    if err := r.ParseMultipartForm(5 * MB); err != nil {
        return "", err
    }

    // Limit upload size
    r.Body = http.MaxBytesReader(w, r.Body, 100*MB) // 5 Mb

    //
    file, multipartFileHeader, err := r.FormFile("file")

    // Create a buffer to store the header of the file in
    fileHeader := make([]byte, 512)

    // Copy the headers into the FileHeader buffer
    if _, err := file.Read(fileHeader); err != nil {
        return "", err
    }

    // set position back to start.
    if _, err := file.Seek(0, 0); err != nil {
        return "", err
    }
    mimetype := http.DetectContentType(fileHeader)
    log.Printf("Name: %#v\n", multipartFileHeader.Filename)
    log.Printf("Size: %#v\n", file.(Sizer).Size())
    log.Printf("MIME: %#v\n", mimetype)
    return mimetype, err
}
jasdel commented 6 years ago

Thanks for the update @peterabarry. I'm having difficulty reproducing this issue. I'm using the following code to upload an object to S3.

package main

import (
    "bytes"
    "flag"
    "fmt"

    "github.com/aws/aws-sdk-go/aws"
    "github.com/aws/aws-sdk-go/aws/session"
    "github.com/aws/aws-sdk-go/service/s3"
    "github.com/aws/aws-sdk-go/service/s3/s3manager"
)

func main() {
    sess := session.Must(session.NewSession(&aws.Config{
        LogLevel: aws.LogLevel(aws.LogDebug),
    }))
    if sess.Config.Region == nil {
        sess.Config.Region = new(string)
    }

    var bucket, key string
    flag.StringVar(sess.Config.Region, "region", *sess.Config.Region, "The `region` to get the object from.")
    flag.StringVar(&bucket, "bucket", "", "The name of the `bucket` to get the object from.")
    flag.StringVar(&key, "key", "", "The S3 object `key` name to get.")
    flag.Parse()

    uploader := s3manager.NewUploader(sess)

    body := bytes.NewBuffer(make([]byte, 0, 1024*1024*20))
    bodySize := body.Cap()
    for i := 0; i < bodySize/len("hello"); i++ {
        body.WriteString("hello")
    }

    resp, err := uploader.Upload(&s3manager.UploadInput{
        Bucket:               &bucket,
        Key:                  &key,
        ContentType:          aws.String("image/jpeg"),
        ServerSideEncryption: aws.String("AES256"),
        Body:                 body,
    })
    if err != nil {
        panic(err)
    }

    fmt.Println("Upload Response", resp)

    svc := s3.New(sess)
    objResp, err := svc.HeadObject(&s3.HeadObjectInput{
        Bucket: &bucket,
        Key:    &key,
    })
    if err != nil {
        panic(err)
    }

    if v := *objResp.ContentType; v != "image/jpeg" {
        panic("expect content type " + v)
    }
    if v := *objResp.ContentLength; v != int64(bodySize) {
        panic(fmt.Sprintf("expect content length %d", v))
    }
}

I can verify that the parts are being uploaded with the logged statements, and verifying the object's content is S3.

2018/06/07 14:57:04 DEBUG: Request s3/UploadPart Details:
---[ REQUEST POST-SIGN ]-----------------------------
PUT /myKey?partNumber=3&uploadId=EYrMnDXu4K1xI.jNzMl1KbM.12KmoKeLX1s7w8ICBxbs9S7rRYBDxSviWDEShcmZCMbruXkFzCnz2tt331GpVFoy4KpFY1w2T_y2qj0VmVqd_WzjZwwSdMUvWc8uOmob HTTP/1.1
Host: myBucket.s3.us-west-2.amazonaws.com
User-Agent: aws-sdk-go/1.14.1 (go1.10.2; darwin; amd64) S3Manager
Content-Length: 5242880
Authorization: 
Content-Md5: IrDWri0GeI7atmW0vCwROQ==
Expect: 100-Continue
X-Amz-Content-Sha256: 50c5711f72196fb29755b590d503d7772c8e7953b37f30ff5d7baf7445091490
X-Amz-Date: 20180607T215704Z
Accept-Encoding: gzip
peterabarry commented 6 years ago

I have moved on from the issue as the roll-back fixed our immediate problem. I do apologise but I am just too busy to try to create a clean isolated repro given the fact I had a workaround. I do feel bad but genuinely cannot spare the time at the moment. Probably we should keep the ticket open for a while and assume that if it is genuinely an issue you will see repeat occurrences? If no on one reports in a few months then just close it?

Reviewing your code it looks similar except you do not reuse a file from a form, and you do not specify the ACL, and also do not defer close the file not that I see any reason for these influencing the behaviour ( my go experience is only basic - few months).

jasdel commented 6 years ago

Thanks for the update @peterabarry. We can leave open for a little bit sure.

The ACL shouldn't impact the body as that is a header value. And defer close won't have any impact because that is just a clean up after the PutObject API is finished.

What I think may actually be the fault here is reusing the file. In general it is invalid to reuse and io.Reader value between operation. Especially in this case the first use of the file will upload the content, but all other probably are uploading a zero object to S3. This would be my first bet for why you are seeing zero byte object.

I'd take a closer look at how that file value is being used. Reusing the same file between requests without special logic to seek to the beginning in of the file before the next usage (file.Seek(0,0)). If this is the case there is a good chance this could break in the future due to some other usage.

kks32 commented 6 years ago

Hi @jasdel I had a similar issue, when the uploader was uploading files of 0Bytes to S3 using the latest version of the SDK. However, when I used this 15a0ee4 version of SDK it was able to upload the files properly.

I noticed that there was an unused file.Read(buffer) line in the code, which was never removed, which was causing this 0Byte upload issue using bc3f534 SDK version. I can confirm using file.Seek(0,0), to seek to the beginning of the file fixed the uploader issue and upload files of the right size. file.Seek(0,0) was not needed in 15a0ee4. I just wanted to confirm that the issue was because of incorrect reuse of a file, which works in the old version 15a0ee4 (this should have been a bug in that version, which is now fixed). Hope this helps!

jasdel commented 6 years ago

Thanks for the update @kks32. In your use case were you intentionally uploading files with zero bytes, or was that an unexpected error that was occurring? If you have a code example that would be very helpful to help us investigate this issue further. Is the content that your application uploaded, offset somewhere within a source io.Reader?

The SDK's Upload manager should of always started from the io.Reader's current offset. If the SDK was seeking back to the head of a file this would of been a bug, and unexpected behavior. If the io.Reader is also an io.ReadSeeker the SDK will attempt to seek within its content to determine the content's size and compute the AWS request signature. Seeking within the io.ReaderSeeker's content should always go back to the original offset the io.ReadSeeker was at when provided to the SDK.

I think this was a bug fixed in v1.12.77 e926b11. Regretfully it looks like this change wasn't added to the SDK's change log. I've gone back and fixed this.

kks32 commented 6 years ago

Hi @jasdel Thanks. It was an unexpected error that was occurring. Here is a code snippet. Some lines have been removed for brevity.

    file, err := os.Open(filename)
    buffer := make([]byte, size)
    fileType := http.DetectContentType(buffer)
        file.Read(buffer) // This line was the problem. 
        // Upload `file` to S3 using S3 manager

In v1.12.77 it uploaded the file of the right size, in the latest version it resulted in 0 Bytes. I don't need file.Read(buffer) , so I have since removed that line. However to test if it is indeed the seek that was causing the issue of 0 byes on the latest version, I added file.Seek(0,0) after file.Read(buffer) and then it uploaded the file correctly with the right size and contents. Thanks for fixing the change log.

jasdel commented 5 years ago

It looks like seeking the buffer after it was read prior to uploading with the SDK resolved the issue you were experiencing. Let us know if you run into future issues with the SDK or have questions.

tonsV2 commented 1 year ago

I'm facing this problem using Go 1.20 and aws-sdk-go v1.44.266