IBM / ibm-cos-sdk-js

ibm-cos-sdk-js
Apache License 2.0
38 stars 20 forks source link

TimeoutError with putObject (intermittent network issues) #92

Closed mriedem closed 2 years ago

mriedem commented 2 years ago

We're using ibm-cos-sdk-js 1.10.0 and we sometimes hit a 60 second TimeoutError here https://github.com/IBM/ibm-cos-sdk-js/blob/1.10.0/lib/http/node.js#L86 when doing a putObject operation. I'm assuming this is a network hiccup and we should just retry. While looking at the code I noticed this handeRequestWithRetries utility function which looks perfect https://github.com/IBM/ibm-cos-sdk-js/blob/1.10.0/lib/util.js#L899 but that only seems to be used for metadata service and IAM token requests https://github.com/IBM/ibm-cos-sdk-js/search?q=handleRequestWithRetries - is there some reason it wouldn't also be used when calling putObject? Or is it and I'm just not following where these things are connected?

https://docs.aws.amazon.com/general/latest/gr/api-retries.html seems to suggest everything is retried but I'm not sure about that given what I'm seeing above.

I'm wondering if a retry is happening, how would I know since there is nothing logged that I can see. It seems like this would be a good place to log something about doing a retry: https://github.com/IBM/ibm-cos-sdk-js/blob/1.10.0/lib/util.js#L911


There is an example stacktrace:

TimeoutError: Connection timed out after 60000ms
    at ClientRequest.<anonymous> (/opt/app-root/src/node_modules/ibm-cos-sdk/lib/http/node.js:86:34)
    at Object.onceWrapper (node:events:509:28)
    at ClientRequest.emit (node:events:390:28)
    at ClientRequest.emit (node:domain:475:12)
    at TLSSocket.emitRequestTimeout (node:_http_client:763:9)
    at Object.onceWrapper (node:events:509:28)
    at TLSSocket.emit (node:events:402:35)
    at TLSSocket.emit (node:domain:475:12)
    at TLSSocket.Socket._onTimeout (node:net:501:8)
    at listOnTimeout (node:internal/timers:557:17)
mriedem commented 2 years ago

This is a response I got in Slack:

As you mention that AWS docs says that every request is retried, yes this is correct and we can configure the retry count when creating a client and if there is a connection timeout then it will retry. So every client call(Ex: createBucket,listBuckets…) comes under this.

Note that we don't set the maxRetries config option but I'm assuming it should be coming through the default config:

https://github.com/IBM/ibm-cos-sdk-js/blob/1.10.0/ts/config.ts#L59

mriedem commented 2 years ago

Here is an example of the COS object after it's constructed from our config (in this case a local dev/test config):

{
   "config":{
      "credentials":{
         "expired":true,
         "expireTime":null,
         "refreshCallbacks":[

         ],
         "serviceInstanceId":"<snip>",
         "accessKeyId":"<snip>",
         "httpOptions":{
            "timeout":60000
         }
      },
      "credentialProvider":{
         "providers":[
            null,
            null,
            null,
            null
         ],
         "resolveCallbacks":[

         ]
      },
      "region":"us-south",
      "logger":{
         "_events":{

         },
         "_eventsCount":0,
         "_level":20,
         "streams":[
            {
               "type":"stream",
               "stream":{
                  "connecting":false,
                  "_hadError":false,
                  "_parent":null,
                  "_host":null,
                  "_readableState":{
                     "objectMode":false,
                     "highWaterMark":16384,
                     "buffer":{
                        "head":null,
                        "tail":null,
                        "length":0
                     },
                     "length":0,
                     "pipes":[

                     ],
                     "flowing":null,
                     "ended":false,
                     "endEmitted":false,
                     "reading":false,
                     "constructed":true,
                     "sync":true,
                     "needReadable":false,
                     "emittedReadable":false,
                     "readableListening":false,
                     "resumeScheduled":false,
                     "errorEmitted":false,
                     "emitClose":false,
                     "autoDestroy":true,
                     "destroyed":false,
                     "errored":null,
                     "closed":false,
                     "closeEmitted":false,
                     "defaultEncoding":"utf8",
                     "awaitDrainWriters":null,
                     "multiAwaitDrain":false,
                     "readingMore":false,
                     "dataEmitted":false,
                     "decoder":null,
                     "encoding":null,
                     "readable":false
                  },
                  "_events":{

                  },
                  "_eventsCount":1,
                  "_writableState":{
                     "objectMode":false,
                     "highWaterMark":16384,
                     "finalCalled":false,
                     "needDrain":false,
                     "ending":false,
                     "ended":false,
                     "finished":false,
                     "destroyed":false,
                     "decodeStrings":false,
                     "defaultEncoding":"utf8",
                     "length":0,
                     "writing":false,
                     "corked":0,
                     "sync":true,
                     "bufferProcessing":false,
                     "writecb":null,
                     "writelen":0,
                     "afterWriteTickInfo":null,
                     "buffered":[

                     ],
                     "bufferedIndex":0,
                     "allBuffers":true,
                     "allNoop":true,
                     "pendingcb":0,
                     "constructed":true,
                     "prefinished":false,
                     "errorEmitted":false,
                     "emitClose":false,
                     "autoDestroy":true,
                     "errored":null,
                     "closed":false,
                     "closeEmitted":false
                  },
                  "allowHalfOpen":false,
                  "_sockname":null,
                  "_pendingData":null,
                  "_pendingEncoding":"",
                  "server":null,
                  "_server":null,
                  "columns":189,
                  "rows":37,
                  "_type":"tty",
                  "fd":1,
                  "_isStdio":true
               },
               "closeOnExit":false,
               "level":20,
               "raw":false
            }
         ],
         "serializers":{

         },
         "src":false,
         "fields":{
            "name":"s3",
            "hostname":"osboxes",
            "pid":2721
         },
         "haveNonRawStreams":true
      },
      "apiVersions":{

      },
      "apiVersion":null,
      "endpoint":"https://s3.us.cloud-object-storage.appdomain.cloud",
      "httpOptions":{
         "timeout":60000
      },
      "maxRedirects":10,
      "paramValidation":true,
      "sslEnabled":true,
      "s3ForcePathStyle":false,
      "s3BucketEndpoint":false,
      "s3DisableBodySigning":true,
      "computeChecksums":true,
      "convertResponseTypes":true,
      "correctClockSkew":false,
      "customUserAgent":null,
      "dynamoDbCrc32":true,
      "systemClockOffset":0,
      "signatureVersion":"v4",
      "signatureCache":true,
      "retryDelayOptions":{
         "base":100
      },
      "useAccelerateEndpoint":false,
      "publicEndpoint":"https://s3.us.cloud-object-storage.appdomain.cloud",
      "serviceInstanceId":"<snip>",
      "accessKeyId":"<snip>",
      "secretAccessKey":"<snip>",
      "params":{
         "Bucket":"ibmq-resultsdb-experiments-staging"
      }
   },
   "endpoint":{
      "protocol":"https:",
      "host":"s3.us.cloud-object-storage.appdomain.cloud",
      "port":443,
      "hostname":"s3.us.cloud-object-storage.appdomain.cloud",
      "pathname":"/",
      "path":"/",
      "href":"https://s3.us.cloud-object-storage.appdomain.cloud/"
   },
   "_events":{
      "apiCallAttempt":[
         null
      ],
      "apiCall":[
         null
      ]
   },
   "_clientId":1
}

Based on that it doesn't look like maxRetries is being set. I guess the default config is coming from this then:

https://github.com/IBM/ibm-cos-sdk-js/blob/1.10.0/lib/config.js#L491-L492

We do override the timeout in config like this:

  s3: { // https://ibm.github.io/ibm-cos-sdk-js/AWS/S3.html#constructor-property
    endpoint,
    publicEndpoint,
    serviceInstanceId: process.env.S3_SERVICE_INSTANCE_ID,
    accessKeyId: process.env.S3_ACCESS_KEY_ID,
    secretAccessKey: process.env.S3_SECRET_ACCESS_KEY,
    region: 'us-south',
    params: {
      // bind requests to a specific bucket in the service instance
      Bucket: process.env.BUCKET_NAME || 'ibmq-resultsdb-experiments-staging'
    },
    httpOptions: {
      // The upstream default is 2 minutes but the Cloudflare 524 timeout is 100 seconds
      // so we want a default below those values but not so low that it will timeout when
      // listing 1000 objects out of the bucket.
      timeout: parseInt(process.env.S3_HTTP_TIMEOUT_MS, 10) || 60000
    }
  }

Regarding maxRetries the default for putObject isn't quite clear to me though from reading this:

https://github.com/IBM/ibm-cos-sdk-js/blob/5d776bcb86f5b715cf0422107fdfc467eb0eb39e/lib/config.js#L32

I guess we'll just maxRetries=3 in our config to be sure it's set and we know what the value is.

avinash1IBM commented 2 years ago

Hello, Every request is retried in case of any timeout errors that you specified above. You can also configure how many times it needs to retry by configuring the maxErrorRetry parameter along with other parameters that you pass while creating the client. Coming to your second question on how to know whether retry is happening, you can get it by configuring logger parameter when creating the client.

avinash1IBM commented 2 years ago

PutObject also honors the maxRetries option. Along with that if we want to whether retry is happening or not, you can configure logger parameter to get the log of retries.

config =
      endpoint: ""
      accessKeyId: ""
      secretAccessKey: ""
      maxRetries=5
      logger:console

I hope this clarifies your query. If your issue is resolved, please close the ticket.

avinash1IBM commented 2 years ago

Below, you can see the 4 different ways of creating the client and their output. without maxRetries and to hit a connection timeout error, i used a fake endpoint along with logger parameter.

config =
      endpoint: "10.0.0.0" // this is fake
      accessKeyId: ""
      secretAccessKey: ""
      logger:console

Output: 
[AWS s3 undefined 4.486s 3 retries] putObject({ Bucket: '<Bucket name i used>', Key: 'foo', Body: 'bar' })
  1. with maxRetries=5, and to hit a connection timeout error, i used a fake endpoint along with logger parameter.
    
    config =
      endpoint: "10.0.0.0" // this is fake
      accessKeyId: ""
      secretAccessKey: ""
      maxRetries:5
      logger:console

Output: [AWS s3 undefined 6.983s 5 retries] putObject({ Bucket: '', Key: 'foo', Body: 'bar' })

 3. without maxRetries and correct endpoint with logger enabled

config = endpoint: "" // I provided correct endpoint here accessKeyId: "" secretAccessKey: "" logger:console

Output: [AWS s3 200 0.098s 0 retries] putObject({ Bucket: 'cos-presignedUrl--571550', Key: 'foo', Body: 'bar' })

 4. with maxRetries and correct endpoint with logger enabled

config = endpoint: "" // I provided correct endpoint here accessKeyId: "" secretAccessKey: "" maxRetries:5 logger:console

Output: [AWS s3 200 0.1s 0 retries] putObject({ Bucket: 'cos-presignedUrl--571550', Key: 'foo', Body: 'bar' })