on the last deploy we found that because of gcsfuse slowness, nginx can fail to start the first time. when that happens, ansible would go on to the next server in the list, stop nginx there, and we would effectively have no servers running nginx and the site would be down until the first one finally got started back up.
This adds three retries to the nginx start, waiting up to a minute between each attempt. If all three fail, the whole playbook should stop so we at least don't end up in the bad situation with no servers available.
on the last deploy we found that because of gcsfuse slowness, nginx can fail to start the first time. when that happens, ansible would go on to the next server in the list, stop nginx there, and we would effectively have no servers running nginx and the site would be down until the first one finally got started back up.
This adds three retries to the nginx start, waiting up to a minute between each attempt. If all three fail, the whole playbook should stop so we at least don't end up in the bad situation with no servers available.