Adding retry option - Githubissues

asyncopation commented 5 years ago

First, thanks for your work on this module!

I found a related issue which was immediately closed for some reason by its author: #238

More and more frequently the Shopify API is returning ETIMEDOUT

I noticed a timeout option was made available to be passed to the got request in pr #98. Are there plans to add an option leverage got's retry option in the same manner? Would you accept a pull request in that regard?

lpinca commented 5 years ago

It is explicitly set to zero

https://github.com/MONEI/Shopify-api-node/blob/69142e45fb5df3e79fbbb71ed66d5b72f39854eb/index.js#L105

but I don't remember why. I think it is because the feature was a bit bugged in got@8. I'm not sure how hard it is to migrate to got@9. It would require a major version bump but I'm fine with that.

chaddjohnson commented 5 years ago

Would be great to have the option to set retries. Thanks for the great library.

lpinca commented 5 years ago

I now remember why it was disabled. There are two problems:

It prevents shopify.callLimits from being updated as the request is retried before we have a chance to read the response headers of the previous request.
It doesn't play well with the autoLimit option because our rate limiter is bypassed.

tim-phillips commented 5 years ago

If we did upgrade to got@9, we could use hooks.beforeRetries and/or hooks.afterResponse to update the limits.

jbgrunewald commented 4 years ago

In regards to the ETIMEDOUT error:

We faced a similar error for a system making a batch of requests for many stores from behind the same static IP. I'm fairly confident this is some sort of defensive firewall mechanism that is dropping packets (probably early DDOS detection since the APIs we are calling make good expansion targets). In our case, we would quickly max out the bucket limits for ~40 stores, then start hitting a wall shortly after and all requests would be returned as ETIMEDOUT after the initial wave of requests was successful.

I think it's related to all requests coming from the same static IP because when tested in another environment we didn't experience the time outs, and that environment doesn't force requests through a NAT.

I'm wondering if anyone else has experienced something similar?

jbgrunewald commented 4 years ago

In the latest versions of the client we are using Got >= 10, so we could move the request limit tracking inside of the Got hooks.

jpk0727 commented 4 years ago

@jbgrunewald did you find out if the problem was some shopify blacklist of your static IP? I'm getting ETIMEDOUT errors from our production server, though on another server, and on my local the code making the same requests works completely fine.

jbgrunewald commented 4 years ago

@jpk0727 I did have an informal discussion with an engineer at Shopify and they confirmed that they thought the behavior was related to a WAF setting. Some API endpoints are protected more than others, because of the possibility for expansion based DDOS attacks. List requests in particular are very small requests from the client side but return large amounts of data from Shopify's in relation to the size of the request from the client. These rates and protections are obviously not documented publicly, so it's difficult to say with any certainty what the rate limits are. I have seen no indication that the blacklisting is permanent. Our system has experienced these timeouts and are able to make requests again later.

The issue with timeouts may also be related to bugs in the later versions of this client also. If you are using any of the 3x versions, I've experienced timeouts related to the GOT library in our production setup. Similar requests in a script do succeed, and I don't have any explanation for that at the moment.

Assuming your issue is similar to ours, we managed this in our system by better controlling the rate we make certain types of batch requests. Previously we were running a scheduled job that would run all of our Shopify brands at the same time to ingest product data. When running them all at once, after ~40 brands were started, we would start experiencing timeouts. In an environment without a NAT and static IP address making the requests we would not have any timeouts. So we determined that make ~40-50 product list requests in a small timeframe (probably less than 5 seconds) resulted in their firewall flagging us and our requests being dropped.

We now have a queue which runs brands one a time, and we no longer experience timeouts from behind the NAT. For some of our brands we making the initial list request, then iterate through each product for additional data from other APIs. The additional data requests are done with many brands at once, but we have yet to see timeouts related to these requests. I suspect this is because the requests are not list requests for large amounts of data.

Again, there's a bit of guesswork here, and we did not definitely root cause the timeouts.

jpk0727 commented 4 years ago

@jbgrunewald thanks so much for your detailed reply. This makes a lot of sense. We have some reporting software that makes list requests on a schedule. Though, we are not making quite as many requests as it sounds like you are, and the blacklist has persisted for 2 days. Do you know how long the blacklist was in place for once it was triggered?

We are using version 3.3.1, so maybe the problem is related to the GOT library. Though, making a list request directly using CURL also times out from the production server.

Again, I really appreciate your help!

mete89 commented 3 years ago

We also need this feature. In our case we would like to be able to retry request when we get random 502-503 errors from Shopify. They are rare but critical. HTTPError: Response code 503 (Service Temporarily Unavailable) HTTPError: Response code 502 (Bad Gateway)

anthlubic commented 3 years ago

Is there any harm polling whenever the Retry-After header is present? Inside Shopify.prototype.request, instead of checking statusCode === 202, just check whether the Retry-After header exists and begin polling.

At the very least, should the statusCode be checked against a 429 as well? 429 + Retry-After is shopify's way of indicating a retry scenario.

lpinca commented 3 years ago

At the very least, should the statusCode be checked against a 429 as well? 429 + Retry-After is shopify's way of indicating a retry scenario.

No, because 429 is a user error. It means that the rate limit is not respected. There is a long discussion about it in https://github.com/MONEI/Shopify-api-node/pull/377, https://github.com/MONEI/Shopify-api-node/issues/324, and https://github.com/MONEI/Shopify-api-node/issues/199.

You can retry on 429 errors yourself in your code.

jbgrunewald commented 3 years ago

@mete89 @anthlubic

I have solved this in my code during client creation by overriding the request method like below. This example is using a relatively old version of the client, so I can't guarantee this will work with later changes.

// reassign the original request function to another field
 client.originalRequest = client.request;

// override the request function to handle exceptional cases as desired
client.request = async (uri, method, key, params) => {
    try {
      const result = await client.originalRequest(uri, method, key, params);
      return result;
    } catch (e) {
      if (e.response && e.response.statusCode === 429) {
        logger.info('[shopify-api-node][requestWrapper] retrying request after 429 response');
        const retryAfter = e.response.headers['retry-after']
          ? e.response.headers['retry-after'] * 2000
          : 0;

        return delay(retryAfter).then(() => client.request(uri, method, key, params));
      }

      return Promise.reject(e);
    }
  };

delay function:

const delay = duration => new Promise(res => setTimeout(res, duration));

MONEI / Shopify-api-node

Adding retry option #251