aws-amplify / aws-sdk-ios

AWS SDK for iOS. For more information, see our web site:
https://aws-amplify.github.io/docs
Other
1.67k stars 877 forks source link

S3 Transfer Utility multi-part upload hangs after two or more suspend / resumes #924

Closed jeffjvick closed 6 years ago

jeffjvick commented 6 years ago

To help us solve your problem better, please answer the following list of questions.

I modified the S3BackgroundTrasnferSampleSwift to do a multipart upload (please add this to the sample, BTW) and changed it to upload videos so the file size is bigger to actually do a multipart upload. I also added in the Readability pod to suspend and resume the upload task when the wifi drops out.

I'm testing loss of wifi by swiping up and turning on/off airplane mode. Cellular data is turned off to take that out of the picture.

If I turn on / off wifi once it seems to resume the upload fine but if I do it a second time it sometimes doesn't properly resume. I don't know if 2 times is the magic number as it may be related to the file size, number of parts, part you are currently in, etc. Basically if you just try suspend / resuming a few times it's going to hang. The video I'm testing with is 208 MB.

Here is the multipart upload portion:

    func uploadImage(with data: Data) {
        let expression = AWSS3TransferUtilityMultiPartUploadExpression()
        expression.progressBlock = progressBlock
        let transferUtility = AWSS3TransferUtility.default()
        transferUtility.uploadUsingMultiPart(
            data: data,
            bucket: S3BucketName,
            key: S3UploadKeyName,
            contentType: "video/mp4",
            expression: expression,
            completionHandler: completionHandler).continueWith { (task) -> AnyObject! in
                if let error = task.error {
                    print("Error: \(error.localizedDescription)")
                    DispatchQueue.main.async {
                        self.statusLabel.text = "Failed"
                    }
                }

                if let uploadTask = task.result {
                    self.refUploadTask = uploadTask
                    DispatchQueue.main.async {
                        self.statusLabel.text = "Generating Upload File"
                        print("Upload Starting!")
                    }
                }

                return nil;
        }
    }

And here is the part that suspends / resumes the refUploadTask based on wifi status:

    @objc func reachabilityChanged(note: Notification) {
        let reachability = note.object as! Reachability

        switch reachability.connection {
        case .wifi:
            refUploadTask?.resume()
            print("wifi")
            break
        case .none:
            refUploadTask?.suspend()
            print("none")
            break
        }

    }

I've tried this without pausing and resuming myself on network loss and it actually seems to work better but I was under the impression Transfer Utility did not do this on its own due to what @kvasukib said in #891. Has this been changed since 4/11?

scb01 commented 6 years ago

hello @jeffjvick

The transfer utility uses NSURLSession under the hood and that takes care of handling intermittent network connectivity issues. If the network drop is long enough ( typically 20 seconds +), the transfer is timed out on the server side and when NSURLSession tries to resume the transfer, it fails.

We added logic in the TransferUtility to detect and retry when this occurs. I am working on fine-tuning the retry mechanism to facilitate the exact approach that you have taken i.e., in the app, detect network failure using reachability and proactively pause/resume the transfer. I will have an update soon on this thread.

jeffjvick commented 6 years ago

Thanks @cbommas for looking into this. Yes, before I knew Transfer Utility was doing this I was probably messing things up by also doing it myself. After letting it handle pause/resume on it's own it is significantly more stable.

Please let me know when you have an update on any improvements you have made to this logic.

scb01 commented 6 years ago

hello @jeffjvick

Confirming that I am still working through this logic. One of the areas I am looking to adjust is to add an exponential back-off strategy for the retries. This should give some more control over the retries.

One thing you can try, in the meantime to make it more stable is to adjust the retryLimit. It defaults to 0, which is basically no retries. You can set it to best suit your needs. I'd reccomend a value of 5 if you are in a particularly choppy network. See the TransferUtilityOptions section of the https://docs.aws.amazon.com/aws-mobile/latest/developerguide/how-to-transfer-files-with-transfer-utility.html page for details on how to do this.

scb01 commented 6 years ago

hello @jeffjvick Just a minor correction, the retrylimit defaults to 0, not 5 as I'd originally stated. I've edited my post above to reflect that, but just wanted you to know in case you were interested in adjusting that parameter.

jeffjvick commented 6 years ago

thanks @cbommas I will try setting the retry limit to 5 to see if it is more resilient to the network tests I'm am conducting.

mutablealligator commented 6 years ago

@jeffjvick Do you have any update after setting the retry limit?

jeffjvick commented 6 years ago

@kvasukib Hi, after letting TU handle pause/resume my resume issues basically went away. I tried again with @cbommas suggestion of changing to 5 retries and it was similar behavior.

This is just using a modified version of the example code but for now it seems generally stable with manual on/off of the network so I would say you can close this issue. I can post a new issue if I have new problems. Thanks.

scb01 commented 6 years ago

Closing this issue.