box / box-java-sdk

The Box SDK for Java.
http://opensource.box.com/box-java-sdk/
Apache License 2.0
156 stars 186 forks source link

Uploading 50 simultaneous files problem #163

Closed luyseyal closed 8 years ago

luyseyal commented 8 years ago

When spawning 50 processes that use the new (and awesome!) AppAuth/AppUser code, I encounter untrapped 429 and 503 exceptions which BoxMultipartRequest.send() appears to try to trap but fails to.

This is in a Groovy command line program that is executed 50 times. The "WHY AM I HERE" shows up about 30 times. I am doing setMaxRequestAttempts(1000).

for (BoxItem.Info itemInfo : folder) {
   if (itemInfo instanceof BoxFile.Info && itemInfo.getName() == fileName) {
     itemInfo.getResource().canUploadVersion(fileName, fileSize, folder.getID())

     int i = 0
     while (i < 10) {
     try {
        itemInfo.getResource().uploadVersion(fileStream)
        return
     } catch (BoxAPIException e) {
        if (e.responseCode == 429 || e.responseCode >= 500) {
          println 'WHY AM I HERE??? Skipping this error:' + boxErrorMessage(e)
          sleep 1000
          i++
          continue
        }
        throw e
     }
...
iDeepesh commented 8 years ago

429 (rate limiting) looks explainable because you are making large number of requests.

https://box-content.readme.io/reference#rate-limiting

gcurtis commented 8 years ago

The SDK will do exponential back off, but it's limited to 3 retries by default. If you're making a lot of requests really quickly, then things may still get rate limited. You can increase the maximum number of attempts with BoxAPIConnection.setMaxRequestAttempts(int) to see if that helps.

Edit: never mind! Just saw that you are setting the max requests. We'll look into this a bit more.

I'm not sure about the 503 errors though. That may be an issue on the API side.

luyseyal commented 8 years ago

I am using setMaxRequestAttempts(1000) and I do get the handy log warnings (yay!). But, if you look at BoxAPIRequest.send() it appears to try to trap 429 and 503 errors. Only BoxMultipartRequest seems to have the problem. Not getting an error with BoxJSONRequest-type calls.

WARNING: Backing off for 2 seconds before retrying 999 more times.
Oct 30, 2015 1:31:58 PM com.box.sdk.BackoffCounter waitBackoff
WARNING: Backing off for 2 seconds before retrying 999 more times.
Oct 30, 2015 1:31:58 PM com.box.sdk.BackoffCounter waitBackoff
WARNING: Backing off for 2 seconds before retrying 999 more times.
WHY AM I HERE??? Skipping this error:com.box.sdk.BoxAPIException: The API returned an error code: 429
{
    "type": "error",
    "status": 429,
    "code": "rate_limit_exceeded",
    "help_url": "http://developers.box.com/docs/#errors",
    "message": "Request rate limit exceeded, please try again later",
    "request_id": "16782012285633b79d7c540"
}
gcurtis commented 8 years ago

Hmm, that's odd. Do you have a stack trace of the exception?

luyseyal commented 8 years ago
com.box.sdk.BoxAPIException: The API returned an error code: 429
{
    "type": "error",
    "status": 429,
    "code": "rate_limit_exceeded",
    "help_url": "http://developers.box.com/docs/#errors",
    "message": "Request rate limit exceeded, please try again later",
    "request_id": "8026280455633c651c6a44"
}
com.box.sdk.BoxAPIException: The API returned an error code: 429
    at com.box.sdk.BoxAPIResponse.<init>(BoxAPIResponse.java:70)
    at com.box.sdk.BoxJSONResponse.<init>(BoxJSONResponse.java:30)
    at com.box.sdk.BoxAPIRequest.trySend(BoxAPIRequest.java:414)
    at com.box.sdk.BoxAPIRequest.send(BoxAPIRequest.java:200)
    at com.box.sdk.BoxAPIRequest.send(BoxAPIRequest.java:175)
    at com.box.sdk.BoxFile.uploadVersion(BoxFile.java:390)
    at com.box.sdk.BoxFile.uploadVersion(BoxFile.java:363)
    at com.box.sdk.BoxFile.uploadVersion(BoxFile.java:353)
    at com.box.sdk.BoxFile$uploadVersion$0.call(Unknown Source)
    at BoxHelper.uploadFileToFolder(BoxHelper.groovy:266)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93)
    at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325)
    at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:384)
    at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1019)
    at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.callCurrent(PogoMetaClassSite.java:69)
    at BoxHelper$_uploadFiles_closure2.doCall(BoxHelper.groovy:209)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93)
    at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325)
    at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:294)
    at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1019)
    at groovy.lang.Closure.call(Closure.java:426)
    at org.codehaus.groovy.runtime.DefaultGroovyMethods.callClosureForMapEntry(DefaultGroovyMethods.java:5226)
    at org.codehaus.groovy.runtime.DefaultGroovyMethods.each(DefaultGroovyMethods.java:2107)
    at org.codehaus.groovy.runtime.dgm$163.invoke(Unknown Source)
    at org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite$PojoMetaMethodSiteNoUnwrapNoCoerce.invoke(PojoMetaMethodSite.java:274)
    at org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite.call(PojoMetaMethodSite.java:56)
    at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:48)
    at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:113)
    at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:125)
    at BoxHelper.uploadFiles(BoxHelper.groovy:205)
    at BoxHelper$uploadFiles$5.call(Unknown Source)
    at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:48)
    at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:113)
    at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:141)
    at PrintToBox.main(PrintToBox.groovy:112)
gcurtis commented 8 years ago

How are you reading the files? Is fileStream a FileInputStream?

The exception will immediately get thrown and not retry if BoxMultipartRequest cannot reset the stream. The exception might just be misleading.

luyseyal commented 8 years ago

Yes, it is. Are you worried the request is reading the whole file, autoclosing the stream, and then complaining later that the file is bad?

gcurtis commented 8 years ago

The problem is that FileInputStream doesn't support the reset() method. The SDK isn't able to automatically retry the request because it can't re-read the InputStream. Therefore, it immediately fails and rethrows the 429 exception.

The exception should definitely be clearer because that error message was pretty unhelpful. If you upload using an InputStream that supports reset() then things should work. Otherwise you'll need to handle the 429 manually and recreate the FileInputStream for each try.

luyseyal commented 8 years ago

After you said that, I went to my Box account and checked and some of the erroneous versions are null files. Totally makes sense. Thanks for the help!

luyseyal commented 8 years ago

Crud. Well, I switched it to BufferedInputStream(new FileInputStream) and I'm still getting the 429 errors. My guess is that since these are tiny files, it is reading the entire file into the buffer, autoclosing the FileInputStream, the SDK retries to send, and lastly the SDK is uploading a null file and returning the 429 exception.

Have any magical InputStream-types that don't autoclose? Really rather not write a wrapper for RandomAccessFile.

gcurtis commented 8 years ago

Unfortunately, I don't know of a FileInputStream in the standard library that allows for seeking. There are some Stackoverflow answers that suggest creating a thin wrapper around the stream: http://stackoverflow.com/a/18665678/108

luyseyal commented 8 years ago

I've also tried RandomAccessFile -> getChannel() -> Channel.newInputStream -> BufferedInputStream. But I get the exact same output.

I could set maxRequestAttempts = 0, but how would that work if I had too many connections during a large file upload? Would it abort or just upload a partial file? How could I continue the upload properly?

I think these cases are equivalent: 1) a small file that fits in the buffer and 2) the very last buffer of a large file.

It's frustrating to think that a 2GB upload might fail because the very last buffer of upload cannot be repeated because the InputStream autocloses, defeating mark()/reset().

Suggestion for Box Java SDK: If fileSize is provided to uploadVersion()/uploadFile(), track the byte count successfully uploaded and make a copy of the final buffer so that it will resend it, provided maxRequestAttempts is still happy.

This will prevent null versions of small files from being uploaded and prevent large files from being corrupted by a failed final byte... I.e., just a workaround for autoclose (sigh).

Thoughts? Any brilliant patterns I'm missing here? Thank you for your help!

luyseyal commented 8 years ago

@gcurtis I am willing to attempt to add this feature to the SDK if you are willing to greenlight it. Since I'm a newbie, there would be some handholding involved.

gcurtis commented 8 years ago

Thanks for the offer. We can definitely help you out if you're looking to contribute a change.

The problem with this solution is that if the request fails, the entire file needs to be resent because the API doesn't support partial/ranged uploads. That's why a BufferedInputStream doesn't really work. I think there are two solutions:

  1. Catch the 429 exception and manually call uploadFile() again with a new FileInputStream.
  2. Create a wrapper around FileInputStream that supports reset() so that the SDK can handle the 429 automatically.

I think that option 2 is the ideal route to take, however there doesn't seem to be anything in the Java standard library that offers this. I wouldn't be opposed to adding a helper class to the SDK that provides mark/reset behavior for file streams. Does that sound like something that would help?

luyseyal commented 8 years ago

FWIW, I just tried the MarkableFileInputStream class from that StackOverflow and it worked! Because it's ambiguously licensed and this is an MIT-licensed project, I grabbed this one and groovified it, instead.

To make Box Java SDK more Groovy-friendly, I highly recommend incorporating a similar class since your average devops or integration architect isn't going to want to mess with fixing FileInputStream.

Thanks for all your help and encouragement! As I stated before, @bgoldman 's AppAuth stuff is soooooo much better for the command line world than token management. Very happy!