emacs-eldev / eldev

Elisp development tool
https://emacs-eldev.github.io/eldev/
GNU General Public License v3.0
226 stars 17 forks source link

Add special CI mode #90

Open doublep opened 1 year ago

doublep commented 1 year ago

I have long wanted to improve CI stability with Eldev, but never really got to doing it.

The problem: CI tests (also for Eldev itself for example) often fail for reasons completely external to the project being tested. E.g. because of networking problem, whatever MELPA bugs (they apparently don't have transactional upgrade mode, so your CI can fail because you just happen to run it in the "hiccup" moment, when the PA itself is in inconsistent state — or that's how it looked to me) or maybe yet something else. I.e. if you just manually restart CI run, it will often succeed. This is very annoying and reduces trust in CI testing overall.

Idea: add a global option to Eldev, called --ci or --robust-mode or something like that. When in that mode, Eldev should retry on certain failures instead of immediately giving up. This mode should be automatically active on various common CI servers, starting with GitHub test servers. I.e. default value should be "auto", and then Eldev would use some heuristics to determine if it is executed in a CI setup (where "auto" would resolve to "yes", i.e. robust mode) or just locally (results in "no").

The largest problem is to figure out specific errors where Eldev should then retry instead of giving up. This is, of course, made particularly difficult by the fact that such errors are not reproducible and happen only from time to time.

@ikappaki, @bbatsov, @sirikid, @LaurenceWarne, @juergenhoetzel, @DarwinAwardWinner: Sorry for batch-pinging, but if you are interested in this, please link (or duplicate here, especially if you restart CI run) stacktraces that look like such intermittent errors where Eldev should retry and I'll try to figure out. If not, just unsubscribe from this thread and sorry again.

DarwinAwardWinner commented 1 year ago

Unfortunately, I haven't done many CI runs recently, and it looks like Github has cleaned up my older CI history. So while I'm sure I've seen transient errors, I'm not sure I have a way to find and provide any stack traces at this time.

doublep commented 1 year ago

OK, looks like I got the first example, with Eldev itself. CI run failed because of "End of file during parsing" "When updating contents of package archive ‘melpa-stable’" during integration tests. Will see if I could somehow make Eldev robust against such stuff.

MELPA-INTERMITTENT-FAILURE.log