Shopify / slate

Slate is a toolkit for developing Shopify themes. It's designed to assist your workflow and speed up the process of developing, testing, and deploying themes.
https://shopify.github.io/slate
MIT License
1.28k stars 364 forks source link

Theme deploying unreliable with 500 errors; won't retry failed asset(s) #1064

Open dgpokl opened 4 years ago

dgpokl commented 4 years ago

Problem

When running slate-tools deploy, errors are frequently encountered which mean that the asset(s) didn't successfully upload. It appears that the sequence of events doing a deploy is:

  1. For each asset, upload the asset:
    • Capture the output text of that upload command into a buffer but don't display it yet
  2. In the event any assets fail to upload, silently fail and continue, i.e. the success of the asset uploads has does not trigger a retry or an abort.
  3. Print out that buffer from step 1 showing all the failures.
  4. Display a success message (even in the case where failures occurred) like this:
    
    Files overwritten successfully!

✨ Done in 204.89s.

The problem with this approach should be apparent, but I'll spell it out: There are frequent transient unexplained **HTTP 500** (or today during the shopify outage, 503) errors that occur with this API. Example of one:

13:05:55 [default]Asset Perform Update to snippets/icon-minus-mobile.liquid at host xxxxxx.myshopify.com Status: 500 Internal Server Error Errors: Internal Server Error

Simply deploying again will always result in a different result - whether that's complete success or a different set of assets that fail depends on...the weather? Anyway, I ~can't~ _really don't want to have to_ build this into a scriptable deploy because I'd have to wait for all several hundred files to be tried, then grep through the STDOUT/STDERR of this deploy command looking for errors, and if even one file failed, I need to deploy the whole thing again (or drop down to themekit and reupload each file myself - but this brings up the question of what is this slate-deploy command even for if one must write a whole program to clean up after it?)

### Replication steps

1. Have a bunch of files, maybe 300? (for all i know though, this may be optional, maybe it can hit anyone)
2. Try to yarn deploy
3. It might succeed or the above scenario might happen, listing a random set of assets and HTTP 500 errors for each.

### What I would consider reasonable (just my opinion)

If this project weren't in deprecated/limbo status, this is what I would propose

##### Option A (lazy but decent): Exit with status 1 IMMEDIATELY anytime an asset upload is NOT successful

I could script that something like this:

while true ; do yarn deploy [[ $? == "0" ]] && break done


Thus ensuring success on an infinite timescale but still wasting a lot of time reuploading files.

##### Option B (sane)
While deploying, check to see if file uploads succeed. If they don't succeed,
  * check to see if there's a retry-after header or something and if so, heed it
  * otherwise just delay 1s or something and retry the file a fixed number of times
  * if it exceeds a certain number of retries, exit 1

### Speculation

- Perhaps the 500 errors are something that only affects people with more than a small number of files
- maybe there's a rate-limiting regime in place and slate is ignorant of what that limit is
- The 500 error seems to provides no explanation of what's wrong, and/or Slate or themekit is eating the headers it would need to tell it when it can retry and instead just blindly continuing.