Open schneems opened 4 days ago
This behavior is actually by design. ci-queue
is designed so that all workers are expendable and soft failing except for actual test failures. This include not being able to reach Redis.
As such, the reporter step isn't optional, it's 100% required as it's the only thing that ensure all queued tests were ran.
This is definitely not documented enough, and you are not the first one to get bit by this.
What we could probably do to is to make this opt-in with some sort of --soft-failure-code 0
or something, as this was done at a time where our CI system didn't support registering some error code for soft failure.
if an flag or env var could be added to set the verification mode to none
That's a different feature request.
What we could probably do to is to make this opt-in with some sort of
--soft-failure-code 0
or something
@ChrisBr any opinion?
Thanks for the extra context. In my case such a flag would meet my needs for increasing confidence. I'm running a small cluster, 16 nodes, and I don't expect any of them to fail or be unavailable.
all workers are expendable and soft failing except for actual test failures
CAP strikes again I guess. It makes sense that you would want your distributed job to allow for failure of nodes.
This include not being able to reach Redis.
Short of adding a new feature, lower hanging fruit could be to emit a warning saying when the worker is going into a soft-failure mode, ideally along with why to differentiate failure modes. Possibly to link to some docs explaining why it's not exiting 0
i.e. highlighting that it's designed with expendable test workers in mind. Basically, raise awareness of the issue.
That's a different feature request.
👍 I'm going to poke at it a bit. I'll be at RubyConf this week, so probably won't have a PR until after that. Thanks for taking a look at the issue.
The issue is described in my debugging notes here https://github.com/heroku/heroku-buildpack-ruby/issues/1505#issuecomment-2469218290
Context
Heroku rolled out a SSL connections by default https://devcenter.heroku.com/changelog-items/2992. However all certificates are self-signed. (as the servers do not "own" the domain
compute-1.amazonaws.com
). Therefore we recommend people disable peer verification https://devcenter.heroku.com/articles/connecting-heroku-redis#connecting-in-ruby.When this change happened, my CI was reporting that my runs were successful however I never noticed that they were executing 0 tests:
While debugging, I tried connecting to a legacy non-ssl url (REDIS_TEMPORARY_URL) and when I made that change everything worked fine.
Expected
When I try to connect to a redis instance and the connection fails for some reason that it will exit non-zero.
Actual
It reports 0 tests were executed and then does not fail
Suggestion
Ideally if the runner cannot connect to redis (for any reason) it would hard fail and ideally give more connection information. I understand that rspec-queue is marked deprecated in the readme, however I'm guessing that this behavior might also persist in other shared connection code-paths.
Beyond improving the connection failure message, if an flag or env var could be added to set the verification mode to none for use with hosted/self-signed certs, that would be helpful as I have no other way of working around this problem at this time.