Shopify / slate

Slate is a toolkit for developing Shopify themes. It's designed to assist your workflow and speed up the process of developing, testing, and deploying themes.
MIT License
1.28k stars 364 forks source link

Theme deploying unreliable with 500 errors; won't retry failed asset(s) #1064

Open dgpokl opened 4 years ago

dgpokl commented 4 years ago


When running slate-tools deploy, errors are frequently encountered which mean that the asset(s) didn't successfully upload. It appears that the sequence of events doing a deploy is:

  1. For each asset, upload the asset:
    • Capture the output text of that upload command into a buffer but don't display it yet
  2. In the event any assets fail to upload, silently fail and continue, i.e. the success of the asset uploads has does not trigger a retry or an abort.
  3. Print out that buffer from step 1 showing all the failures.
  4. Display a success message (even in the case where failures occurred) like this:
    Files overwritten successfully!

✨ Done in 204.89s.

The problem with this approach should be apparent, but I'll spell it out: There are frequent transient unexplained **HTTP 500** (or today during the shopify outage, 503) errors that occur with this API. Example of one:

13:05:55 [default]Asset Perform Update to snippets/icon-minus-mobile.liquid at host Status: 500 Internal Server Error Errors: Internal Server Error

Simply deploying again will always result in a different result - whether that's complete success or a different set of assets that fail depends on...the weather? Anyway, I ~can't~ _really don't want to have to_ build this into a scriptable deploy because I'd have to wait for all several hundred files to be tried, then grep through the STDOUT/STDERR of this deploy command looking for errors, and if even one file failed, I need to deploy the whole thing again (or drop down to themekit and reupload each file myself - but this brings up the question of what is this slate-deploy command even for if one must write a whole program to clean up after it?)

### Replication steps

1. Have a bunch of files, maybe 300? (for all i know though, this may be optional, maybe it can hit anyone)
2. Try to yarn deploy
3. It might succeed or the above scenario might happen, listing a random set of assets and HTTP 500 errors for each.

### What I would consider reasonable (just my opinion)

If this project weren't in deprecated/limbo status, this is what I would propose

##### Option A (lazy but decent): Exit with status 1 IMMEDIATELY anytime an asset upload is NOT successful

I could script that something like this:

while true ; do yarn deploy [[ $? == "0" ]] && break done

Thus ensuring success on an infinite timescale but still wasting a lot of time reuploading files.

##### Option B (sane)
While deploying, check to see if file uploads succeed. If they don't succeed,
  * check to see if there's a retry-after header or something and if so, heed it
  * otherwise just delay 1s or something and retry the file a fixed number of times
  * if it exceeds a certain number of retries, exit 1

### Speculation

- Perhaps the 500 errors are something that only affects people with more than a small number of files
- maybe there's a rate-limiting regime in place and slate is ignorant of what that limit is
- The 500 error seems to provides no explanation of what's wrong, and/or Slate or themekit is eating the headers it would need to tell it when it can retry and instead just blindly continuing.