emacs-eldev / eldev

Elisp development tool
https://emacs-eldev.github.io/eldev/
GNU General Public License v3.0
227 stars 17 forks source link

My CI tests for amx package work with Cask but fail bizarrely with Eldev #6

Closed DarwinAwardWinner closed 4 years ago

DarwinAwardWinner commented 4 years ago

I am experimenting with switching my amx package from Cask to Eldev. However, I am hitting a road block: I'm getting weird test failures in Travis with Eldev that don't occur with Cask.

Not only are several tests failing with Eldev, but the test command is also hanging instead of exiting after running the tests, and then the run gets killed by Travis after 10 minutes of inactivity. The only differences between these 3 commits are the addition of Eldev, addition of recommended eldev lines to .gitignore, and conversion from Cask to Eldev in .travis.yml:

3 files changed, 39 insertions(+), 37 deletions(-)
.gitignore  |  4 ++++
.travis.yml | 58 +++++++++++++++++++++-------------------------------------
Eldev       | 14 ++++++++++++++

There's a couple of different things going on here. First, the baseline: the EVM+Cask build all passes, as expected, since that's what I've been using.

When I switch to EVM+Eldev, Emacs 25 and lower fail 12 out of 39 tests, and then the eldev test command hangs forever until Travis kills it after 10 minutes (example: job 105.5). The tracebacks on all the tests appear to point to an error deep in the cl loop macro expansion, which doesn't make any sense. Emacs 26 and up (including snapshot) instead fail quickly with a gnutils error during package installation (example: job 105.7).

When I switch to nix-emacs-ci+Eldev, all the tests pass, but the test process still hangs forever and gets killed (e.g. job 111.6), except on Emacs 26.2 and 26.3, where it exits properly and everything works (e.g. job 111.9).

So, obviously there's a lot going on, but any guidance or help in figuring out these errors would be welcome.

Note that eldev runs my tests just fine locally.

doublep commented 4 years ago

any guidance or help in figuring out these errors would be welcome

This, unfortunately, might not be so easy. I see no problems in file Eldev and the tests also run fine locally. I'm sure interested in having everything fixed and making Eldev work for you, but without help from your side I will not manage to do it.

(eldev-add-extra-dependencies 'test 'undercover)

I feel this might explain at least some of the errors you get. E.g. job #111.6 hung after loading the project as source code, not as a package. AFAIK undercover triggers some code specifically when run on Travis CI and when the code is not byte-compiled. Just to see if this might be the cause, can you temporarily remove undercover from the list of extra dependencies?

Emacs 26 and up (including snapshot) instead fail quickly with a gnutils error during package installation

That's the reason I recently switched one of my packages to nix-emacs-ci: I also had similar errors on Travis CI (but not locally).

Unintentionally, Eldev bootstraps itself from http://..., but later it uses https to access the package archives. Bootstrapping (http) works, but later with https it fails. I will have a look why it works for Cask; might be that Cask just always uses http protocol.

DarwinAwardWinner commented 4 years ago

Ok, I disabled undercover and I'm getting the same result: https://travis-ci.org/DarwinAwardWinner/amx/builds/650646783

... almost. Now emacs-snapshot is also passing. However, looking through the logs, It seems that the problem might have to do with a function that smex addes to kill-emacs-hook that is throwing an error. I guess this interrupts the quitting process and somehow causes emacs to never exit, even though it was run with --batch? I'm just guessing here. And maybe the handling of errors during quitting was changed slightly in Emacs 26.2, because it still gets the same error, but manages to quit anyway. That hook shouldn't be running anyway, so I'll try finding a way to disable it and then re-run.

DarwinAwardWinner commented 4 years ago

Ok, that seems to have fixed things: https://travis-ci.org/DarwinAwardWinner/amx/builds/650715934

Time to re-enable undercover and see what happens.

DarwinAwardWinner commented 4 years ago

Ok, so undercover is causing the individual test failures via cl-loop macro weirdness: https://travis-ci.org/DarwinAwardWinner/amx/builds/650716736.

At this point, I think I can summarize conclusions:

Given that I do think the coverage reports are useful, I think I'll have to stick with my existing CI setup using Cask for now, even though I like Eldev a lot. Actually, it turns out my EVM+Cask setup wasn't even running the coverage reports, and when I enabled them, the same errors cropped up. So this part has nothing to do with Eldev, it's just an undercover issue.

DarwinAwardWinner commented 4 years ago

So, I just made a revision to my previous comment based on my further investigation: in short, the undercover issues are unrelated to Eldev. That means I can switch to Eldev after all. However, you may still want to look into handling errors in kill-emacs-hook more gracefully. I don't think it's intended that such errors should cause Eldev to hang indefinitely.

doublep commented 4 years ago

Thank you for the investigations. I'm leaving this open for the remaining issues.

It seems that the problem might have to do with a function that smex addes to kill-emacs-hook that is throwing an error. I guess this interrupts the quitting process and somehow causes emacs to never exit, even though it was run with --batch?

I see it e.g. in the logs of build #113.6, but not in the current logs, hm. I don't understand, why this problem is gone. Or was it also indirectly because of undercover?

However, if you have this or similar problems in the future again, you can take advantage of file Eldev being a program. E.g. simply adding this to Eldev should be enough of a workaround (advice-add seems to work even for functions that are not defined yet):

(advice-add 'smex-save-to-file :override #'ignore)

You can also silence "Warning (amx): Not saving amx state from "emacs -Q"." in the same way, or by setting a variable that makes the relevant function not run etc.

Eldev can fail to exit and instead hang forever if an error is signaled from kill-emacs (usually by a function in kill-emacs-hook)

Sounds strange, because Eldev doesn't alter anything about Emacs as far as I understand. I will try to reproduce it locally.

By the way, after these issues I think Eldev should change variable user-emacs-directory to point somewhere inside .eldev/EMACS-VER: if it isolates the project from your normal Emacs otherwise, it looks incorrect that locate-emacs-user-file still gives you access to ~/.emacs.d.

EVM+Eldev doesn't work well. (You already knew this)

Yes, but now that I know it still somehow works with Cask, I sure need to investigate this.

Eldev and undercover.el together result in weird errors?

I'll look into integrating undercover in the future, even if there are currently some problems with it unrelated to Eldev.

That means I can switch to Eldev after all

You must be the first user then ;) By the way, do you find Eldev useful for normal development, or was it just for continuous integration?

DarwinAwardWinner commented 4 years ago

I see it e.g. in the logs of build #113.6, but not in the current logs, hm. I don't understand, why this problem is gone. Or was it also indirectly because of undercover?

I fixed it by already doing as you suggested and disabling the offending hook: https://github.com/DarwinAwardWinner/amx/commit/b6c6673fc7ca46e7e805b243196731b37b305e02. However, doing this via advice, and inside the Eldev file, is a good idea that I will adopt.

As for EVM+Cask working "better", that might well be because of the various hacks and workarounds I had accumulated, e.g. here: https://github.com/DarwinAwardWinner/amx/blob/570162fe9d772fe7493c8d281586b746b1c5c5db/.travis.yml#L21-L41. I'm not even sure how much of that is still required. Part of the motivation for switching to Eldev was to drop all that cruft.

do you find Eldev useful for normal development, or was it just for continuous integration?

I actually find it quite useful. With Cask, I was using a slightly janky but functional Makefile to automate my common development steps: https://github.com/DarwinAwardWinner/amx/blob/570162fe9d772fe7493c8d281586b746b1c5c5db/Makefile. But Eldev already automates all those things, which means I can also drop that Makefile entirely.

One other thing I've noticed: one of my tests for another one of my packages, ido-completing-read+, actually has 2 test suites that need to be run in separate commands, because one of the test suites loads the flx-ido package and the other one needs to run without that package loaded. This is a bit awkward but doable with Eldev using 2 eldev test commands, with the second one adding -S '(load-file "Eldev-additional-config")'. It would be slightly less awkward if I could use Emacs -l option, something like eldev -l Eldev-additional-config test.

doublep commented 4 years ago

As for EVM+Cask working "better", that might well be because of the various hacks and workarounds I had accumulated

No, I was able to reproduce it on Travis CI with a clean Cask installation. In fact, even just running package-refresh-contents from the command line (with a little setup) works fine, but it doesn't work under Eldev. I tried a hundred of things, but couldn't pinpoint the problem (of course it has to be unreproducible locally). Most annoying thing, it used to work just a month ago, but then they apparently changed something on Travis machines and now it fails, but only for some Emacs versions, not for all...

This is a bit awkward but doable with Eldev using 2 eldev test commands, with the second one adding -S '(load-file "Eldev-additional-config")'. It would be slightly less awkward if I could use Emacs -l option, something like eldev -l Eldev-additional-config test.

This doesn't sound like a generally-useful feature to me. How about adding an option for your project's Eldev that just loads this file? E.g. (untested):

(eldev-defoption my-project-use-flx-ido ()
  "Use `flx-ido'"
  :options        (--flx --flx-ido)
  ;; Do whatever, call `load-file' or maybe just inline it here.
  )

And then you can both eldev test and eldev --flx test.

doublep commented 4 years ago

I released 0.2.1 with a fix for GnuTLS-related problem. So, it was a problem in Eldev, but indirectly caused by a bug in older Emacs versions. I added the original workaround because otherwise self-compiled Emacs 25.3 wouldn't work locally.

Now Eldev works again with EVM on Travis CI with all Emacs versions, not only with nix-emacs-ci.

doublep commented 4 years ago

However, you may still want to look into handling errors in kill-emacs-hook more gracefully. I don't think it's intended that such errors should cause Eldev to hang indefinitely.

Absolutely, but I cannot reproduce such errors locally. I have a feeling that they also were indirectly caused by undercover. E.g. I added this in Eldev-local of a project:

(eldev-add-extra-dependencies 'test '(:package smex :archive melpa-unstable))

(add-hook 'eldev-load-dependencies-hook (lambda (&rest ...)
                                          (when (require 'smex nil t)
                                            (smex-initialize)
                                            ;(setf smex-save-file "~/lol/kek/foo")
                                            )))

(advice-add 'smex-save-to-file :before (lambda (&rest ...) (eldev-warn "smex-save-to-file %s" smex-save-file)))

After it eldev test either works fine (both with 0.2 and 0.2.1) or prints this message at the end:

Error in kill-emacs-hook (smex-save-to-file): (file-missing "Opening output file" "No such file or directory" "/home/paul/lol/kek/foo")

if I uncomment that line changing smex-save-file (also both in 0.2 and 0.2.1). I also tried just adding an erroneous hook to kill-emacs-hook, but again I only receive a message from Emacs like the one above.

doublep commented 4 years ago

Yesterday I create a pull request for undercover to fix the library on Emacs 27+ (i.e. snapshot in your examples above).

Overview of solved or non-Eldev issues for now:

What remains are the hangs on stable Emacs version. Might also be triggered by undercover, but at least this wouldn't be fixed by the PR.

By the way, what I found useful for investigating hangs is adding form (add-hook 'kill-emacs-hook 'backtrace) to Eldev-local (or with -S). With it, backtrace is printed also if you kill the process with C-c.

doublep commented 4 years ago

Eldev 0.3 features undercover integration plugin. The tool itself hangs on Emacs 27 and up, nothing I can do about it. Closing this issue as all problems seem to be either solved on non-Eldev.