emacs-ess / ESS

Emacs Speaks Statistics: ESS
https://ess.r-project.org/
GNU General Public License v3.0
620 stars 162 forks source link

Resource exhaustion with emacs 28's native compilation. #1222

Open HinTak opened 2 years ago

HinTak commented 2 years ago

Somewhat disappointed with the lack of response with #1207 , so I gave current head ecd8865bbbdf6664b66be5ffd5d4e62d5af78240 a go against emacs 28.1 .

commit ecd8865bbbdf6664b66be5ffd5d4e62d5af78240 (HEAD -> master, origin/master, origin/HEAD)
Author: Vitalie Spinu <spinuvit@gmail.com>
Date:   Fri Sep 2 10:18:55 2022 +0200

    [Fix #1220] Make ess-r-initialize-on-start interactive

It still spawns multiple emacs processes and potentionally lead to resource exhaustion, and lock-ups/crashes.

One /tmp/emacs-async-comp-ess-custom-*.el and 123 /tmp/emacs-int-comp-subr--trampoline-*_delete_char_0-*.el Each correspond to a spawn process, so I had 123 new emacs instances running the latest ess, before I killed it.

looking at the content of the singular /tmp/emacs-async-comp-ess-custom-*.el, it says, minor formatting for readability:

 (require 'comp) ...
... (message "Compiling %s..." "/usr/share/emacs/site-lisp/ess/lisp/ess-custom.el")

(comp--native-compile "/usr/share/emacs/site-lisp/ess/lisp/ess-custom.el" t)

According to https://koji.fedoraproject.org/koji/buildinfo?buildID=2002348 , fedora's emacs 28 is built with the new native compilation support, in the change log:

Build with Native Compilation support and natively compile all .el files

So there you are, the problem seems to be "(comp--native-compile "/usr/share/emacs/site-lisp/ess/lisp/ess-custom.el" t)"

HinTak commented 2 years ago

I believe the resource exhaustion comes from emacs trying do do (comp--native-compile "/usr/share/emacs/site-lisp/ess/lisp/ess-custom.el" t). You need a build of emacs 28 with naive compilation enabled to try this.

lionel- commented 2 years ago

Is this a different issue than in #1207?

lionel- commented 2 years ago

Sorry I can't reproduce.

HinTak commented 2 years ago

Native compilation is on by default for packages located under "/usr/share/emacs/site-lisp/". So you probably won't see it if you put ess under MELPA or your own ~/.emacs .

I did it by doing "git archive ..." , generating a tarball on top of fedora's older rpm and rebuilding it. The rebuild process doesn't seem to do much other than trying byte-compile (successful and uneventful, so this confirms most of the warnings reported in #1207 as redherring, as there are no warnings in the byte-compile). There are two things I haven't thought of: what little stubs the fedora maintainer adds, and how the system copes with trying to native compile to a system location by a non-root user. Maybe one of these two is the problem.

HinTak commented 2 years ago

The stub I talked about is just (requite 'ess-site), I think, besides putting ess under /usr/share/emacs/site-lisp

https://src.fedoraproject.org/rpms/emacs-common-ess/blob/rawhide/f/emacs-common-ess.spec#_88

lionel- commented 2 years ago

So you probably won't see it if you put ess under MELPA or your own ~/.emacs .

I've launched native compilation manually on these files and it succeeded.

HinTak commented 2 years ago

You are sure your emacs is built with it enabled? ./configure... --enable-native-compilation I think.

lionel- commented 2 years ago

yes and I see the compiled .eln files in my cache.

HinTak commented 2 years ago

I wonder how much of it is fedora-specific - there are 3 factors, how emacs is built, where ess is located (in /usr/share/emacs/site-lisp) and what little additions thar the fedora packager does to ess. Hope somebody answers in the old thread about non-fedora systems.

I haven't used R for a number of years but uses emacs on a daily basis. The logical choice for me is just to uninstall ess until I need R again, if ever. So I'd rather not spend too much time on this... I guess I could tar up my /usr/share/emacs/site-lisp/ess for comparison, and tries a few of the "disable native compilation" directives to see if they work around things and give more clues to the problem.

One thing I thought of, but prefer not to try, is to launch emacs as root - if the cause is a permission problem of non-root user trying to native compile and write to system location and failing over and over somehow, this might work around it. But I'd rather not do that :-(.

cgorac commented 1 year ago

I have no problem running emacs as root, at least not on my personal laptop, so I've tried and the problem is still there. Fedora 36, Emacs 28.1, ESS 18.10.2.

HinTak commented 1 year ago

@cgorac do you mean emacs works correctly, with ess installed, when run as root?

juhp commented 1 year ago

Dunno if it helps, but I heard that uim-1.8.9 included some fixes for newer Emacs which supposed/hopefully fixes uim elisp installation for Emacs 28.

(I dunno if there has been any emacs upstream discussion about this general issue?)

cgorac commented 1 year ago

@cgorac do you mean emacs works correctly, with ess installed, when run as root?

No, it hangs, just as when run under regular user account.

juhp commented 1 year ago

Anyone tried setting native-comp-async-jobs-number?

cgorac commented 1 year ago

I've tried with (setq native-comp-async-jobs-number 4) in my .emacs, and it doesn't help.

juhp commented 1 year ago

I've tried with (setq native-comp-async-jobs-number 4) in my .emacs, and it doesn't help.

If you have 8 vcpus then 4 is already the default.

cgorac commented 1 year ago

It doesn't matter, tried with 2 too, the same thing happens - Emacs GUI is stuck, in the background loads of Emacs processes are launched, until either interrupted or machine hangs.

maitra commented 1 year ago

Dunno if it helps, but I heard that uim-1.8.9 included some fixes for newer Emacs which supposed/hopefully fixes uim elisp installation for Emacs 28.

(I dunno if there has been any emacs upstream discussion about this general issue?)

I do not have uim installed, and this is the first time I heard about it. But I was wondering if that would help with fixing the problem here.

maitra commented 1 year ago

I wonder how much of it is fedora-specific - there are 3 factors, how emacs is built, where ess is located (in /usr/share/emacs/site-lisp) and what little additions thar the fedora packager does to ess. Hope somebody answers in the old thread about non-fedora systems.

I haven't used R for a number of years but uses emacs on a daily basis. The logical choice for me is just to uninstall ess until I need R again, if ever. So I'd rather not spend too much time on this... I guess I could tar up my /usr/share/emacs/site-lisp/ess for comparison, and tries a few of the "disable native compilation" directives to see if they work around things and give more clues to the problem.

One thing I thought of, but prefer not to try, is to launch emacs as root - if the cause is a permission problem of non-root user trying to native compile and write to system location and failing over and over somehow, this might work around it. But I'd rather not do that :-(.

Excellent point. If it is Fedora-specific, then we can have them fix it, though it does appear that Fedora appears to have this issue only (?) with emacs-ess. Btw, here is the emacs spec file on fedora which is how emacs.zip emacs is packaged there. Because github does not support .spec files, I have put it in a zip archive. Perhaps that might help in reproducing the issue? Emacs-ESS is essentially useless in Emacs 28.1. I have downgraded to Emacs 27.2, but with Fedora 37 coming out soon, that may not be an option for those of us who want to upgrade. In any case, Fedora 36 will expire in May 2023, and then all users will need to upgrade.

juhp commented 1 year ago

Excellent point. If it is Fedora-specific, then we can have them fix it, though it does appear that Fedora appears to have this issue only (?) with emacs-ess.

No, some other Fedora elisp packages are also affected (which is why I mentioned uim as an example).

This is the location of Fedora's emacs.spec

juhp commented 1 year ago

Okay I found that the Fedora emacs-vm package added the following to it vm-init.el file to disable native-comp for itself:

+ ;; For some reason, native compilation breaks VM. As a workaround until the
+ ;; problem is understood and fixed, disable native compilation of all VM lisp files.
+ (eval-after-load "comp"
+     '(if (boundp 'native-comp-deferred-compilation-deny-list)
+         (add-to-list 'native-comp-deferred-compilation-deny-list "/vm.*\.el"))) 

https://src.fedoraproject.org/rpms/emacs-vm/c/909b0bc357976252c51502bf17ed1efc6aeb7b97?branch=rawhide

I suppose similar could be done for ess if you are suffering from this issue.

HinTak commented 1 year ago

That's a useful tips - I'll give it a try at some point.

maitra commented 1 year ago

Okay I found that the Fedora emacs-vm package added the following to it vm-init.el file to disable native-comp for itself:

+ ;; For some reason, native compilation breaks VM. As a workaround until the
+ ;; problem is understood and fixed, disable native compilation of all VM lisp files.
+ (eval-after-load "comp"
+     '(if (boundp 'native-comp-deferred-compilation-deny-list)
+         (add-to-list 'native-comp-deferred-compilation-deny-list "/vm.*\.el"))) 

https://src.fedoraproject.org/rpms/emacs-vm/c/909b0bc357976252c51502bf17ed1efc6aeb7b97?branch=rawhide

I suppose similar could be done for ess if you are suffering from this issue.

Thanks very much for this lead! Can this be done locally by the user? In that case, I guess I put it in my local .emacs file?

HinTak commented 1 year ago

About that compilation deny list being in .emacs, I believe so. I intend to give it a try at some point...

maitra commented 1 year ago

I tried entering the following at the beginning (and separately, the end) of my .emacs file:


;; Startup settings for ESS (this is borrowed from VM)
;; 
;; For some reason, native compilation breaks VM. As a workaround until the
;; problem is understood and fixed, disable native compilation of all VM
;; lisp files.
(eval-after-load "comp"
    '(if (boundp 'native-comp-deferred-compilation-deny-list)
        (add-to-list 'native-comp-deferred-compilation-deny-list "/ess.*\.el")))

Unless I am making a mistake here in making the change from .vm to .ess, I got no different results than before, and the system become unusable.

HinTak commented 1 year ago

Pretty sure you did wrong. The "/" at the beginning probably have special meaning, as in if the directive is not from a file in the same directory, probably need full path or something there. Need to consult the actual documentation of using the deny from a config file located elsewhere from the native-compiled file.

HinTak commented 1 year ago

Also for ess, you need to get rid of the "." after - ess files are named "ess.el" and "ess-*.el", with a "-".

maitra commented 1 year ago

Thanks! Sorry, but including the entire path had not much effect. I do get stuck at Loading /usr/share/emacs/site-lisp/site-start.d/ess-init.el

I tried:

;; Startup settings for ESS (this is borrowed from VM)
;;
;; For some reason, native compilation breaks VM. As a workaround until the
;; problem is understood and fixed, disable native compilation of all VM
;; lisp files.
(eval-after-load "comp"
    '(if (boundp 'native-comp-deferred-compilation-deny-list)
        (add-to-list 'native-comp-deferred-compilation-deny-list "/usr/share/emacs/site-lisp/site-start.d/ess*\.el")))

Even explicitly including the entire path. Perhaps still not doing something correctly here.

HinTak commented 1 year ago

If it is still getting stuck at loading ess-init.el, the obvious thing to try is to insert that fragment (still need to look up the syntax etc for those bits) into the very beginning of that file.

maitra commented 1 year ago

If it is still getting stuck at loading ess-init.el, the obvious thing to try is to insert that fragment (still need to look up the syntax etc for those bits) into the very beginning of that file.

Which file? I would like to try and see if this can be resolved because as far as I am concerned, emacs has become unusable with 28.2 and emacs-ess.

HinTak commented 1 year ago

ess-init.el, of course.

maitra commented 1 year ago

ess-init.el, of course.

Currently, /usr/share/emacs/site-lisp/site-start.d/ess-init.el only has the following:

;;; Set up emacs-common-ess for Emacs.
;;;
;;; This file is automatically loaded by emacs's site-start.el
;;; when you start a new emacs session.

(require 'ess-site)

So, I put that text in here? I am confused.

HinTak commented 1 year ago

Why is that confusing? Put some "native-compile-deny..." stuff at the top of ess-init.el, before the "(require..." line, seems the obvious thing to try.

maitra commented 1 year ago

Why is that confusing? Put some "native-compile-deny..." stuff at the top of ess-init.el, before the "(require..." line, seems the obvious thing to try.

Honestly, I don't quite know what this means, and therefore I am flying blind here. So the file should be:

;;; Set up emacs-common-ess for Emacs.
;;;
;;; This file is automatically loaded by emacs's site-start.el
;;; when you start a new emacs session.
native-comp-deferred-compilation-deny-list "/usr/share/emacs/site-lisp/site-start.d/ess*\.el"
(require 'ess-site)

Is this correct, or should it be something else? Thanks!

HinTak commented 1 year ago

Hmm, I have over-estimated other people's knowledge of lisp. In a nutshell, ";" are comments and ignored, but "()" are meaningful.

So you need to do the whole:

(eval-after-load "comp"
    '(if (boundp 'native-comp-deferred-compilation-deny-list)
        (add-to-list 'native-comp-deferred-compilation-deny-list "/usr/share/emacs/site-lisp/site-start.d/ess*\.el")))

If you are not sure.

In fact you only need this part,

(add-to-list 'native-comp-deferred-compilation-deny-list "/usr/share/emacs/site-lisp/site-start.d/ess*\.el")

Since the other two lines are conditionals, and we already know they will be true. The "eval-after-load" part means "insert this and do it when". (Removing it means "do it now"). The "if boundp" part is a typical version check in emacs: instead of doing version checks for emacs, the emacs people recommends that you checks for the actual features you want to use. Thus, that section means "if it is possible to disable native compile by a deny list, please add to the deny list...". There is no need to have the "if it is possible to disable native comple by a deny list," part.

maitra commented 1 year ago

Thanks, yes, you are over-estimating my knowledge of lisp. However, I tried as suggested which appears to be what I wrote above: my new ess-init.el file reads:


;;; Set up emacs-common-ess for Emacs.
;;;
;;; This file is automatically loaded by emacs's site-start.el
;;; when you start a new emacs session.

native-comp-deferred-compilation-deny-list "/usr/share/emacs/site-lisp/site-start.d/ess*\.el"
(require 'ess-site)

However, I get:

Loading /usr/share/emacs/site-lisp/site-start.d/auctex.el (source)...done
Loading /usr/share/emacs/site-lisp/site-start.d/auto-complete-init.el (source)...done
Loading /usr/share/emacs/site-lisp/site-start.d/autoconf-init.el (source)...done
Loading /usr/share/emacs/site-lisp/site-start.d/clang-format.el (source)...done
Loading /usr/share/emacs/site-lisp/site-start.d/clang-include-fixer.el (source)...done
Loading /usr/share/emacs/site-lisp/site-start.d/clang-rename.el (source)...done
Loading /usr/share/emacs/site-lisp/site-start.d/cmake-init.el (source)...done
Loading /usr/share/emacs/site-lisp/site-start.d/desktop-entry-mode-init.el (source)...done
Loading /usr/share/emacs/site-lisp/site-start.d/emacs-goodies-loaddefs.el (source)...done
Loading /usr/share/emacs/site-lisp/site-start.d/ess-init.el (source)...
load: Symbol’s value as variable is void: native-comp-deferred-compilation-deny-list

Is this last bit what I should be getting? Not sure. Also, ess does not seem to be loading anymore. Thanks!

HinTak commented 1 year ago

I tried everything we discussed so far, and nothing seems to stop native compilation. So I think I'll try one very tedious thing next: there is one line you can insert to a .el file to say, "don't native compile me". The entire ess file set only have 50-60 .el file. So it is just insert that 60 times. (Unfortunately it needs to be the first line). Some of it (40+ or so) can be programmatic. So probably will take 30 minutes to do it all, auto plus some 15 manual editing.

HinTak commented 1 year ago

My barbarism (inserting 40+ "don't compile me" into 40+ *.el files) - seems to work.

HinTak commented 1 year ago

I did it programmatically manipulating those lines already with "lexical-binding: t". It is a bit disgusting that "make all" actually goes online and fetch two extra *.el files???

HinTak commented 1 year ago

Native compilation still does not like the remaining *.el and tries to native compile them each launch without success, but at least it seems to stop after a short while, instead of going out of control.

I see a few ess files actually do "(require 'compile)", which is probably the source of this problem.

HinTak commented 1 year ago

Hurray, I think I understand the bug now, and it is generic to emacs. I have a very simple work-around. The workaround is this:

When you launch emacs, and it starts to eat resources and spawn a lot of process of the form:

/usr/bin/emacs --batch -l /tmp/emacs-int-comp-subr--trampoline-64656c6574652d63686172_delete_char_0-QeNLBe.el

Do a "killall emacs" to kill them all. You should have lots of "emacs-int-comp-subr--trampoline*.el" left overs in /tmp. Pick one, run this:

/usr/bin/emacs -Q --batch -l /tmp/one-of-those-files

Note the -Q there, that's important!!! That's it. Now you launch emacs, it should smoothly native-compile ess (i.e. it would spawn one or two new processes for a while, until it has done about 50 of them, quite gradually).

I think it is some kind of race condition: to do any native compilations at all, a natively compiled trampolline must first be built; Without the "-Q", when emacs tries to build the trampoline, it loads ESS before the build, and thus the probem escalates.

I found this out by scattering a lot of "no-native-compile" into ess's el files. That slows down self-multiplying native compilation of ESS itself enough, that once in a while the trampoline gets built and one of my accounts works afterwards. Deleting the native cache gets me back to the old situation, some accounts (I was trying things out with both root and user) still have a copy of ~/.emacs.d/eln-cache/28.1-b1f2d84a/subr--trampoline-64656c6574652d63686172_delete_char_0.eln, copying it over makes emacs works with another account again. Experimented with a "zero-sized" file for that, it stopped native compilation (I don't have R installed, so don't know if it works that way or not) all together. Then figured out how to make it by hand with -Q.

HinTak commented 1 year ago

Filed upstream as https://debbugs.gnu.org/cgi/bugreport.cgi?bug=60208 , '28.1; Resource exhaustion with emacs 28's native compilation; need "-Q" for trampoline'

HinTak commented 1 year ago

Just saving people looking at the upstream gnu.org exchange - turn out the fact that Fedora has a "/usr/share/emacs/site-lisp/site-start.d/ess-init.el" , which contains the single line, "(require 'ess-site)", is important. Emacs doesn't do recursive loading via user config (~/.emacs) when native-compiling. But site-wide auto-loading via /usr/share/emacs/site-lisp/site-start.d/ess-init.el is not currently catered for.

At the moment the fix is looking to be a new emacs release .

HinTak commented 1 year ago

Fix pushed to emacs 29 branch https://git.savannah.gnu.org/cgit/emacs.git/commit/?h=emacs-29

juhp commented 1 year ago

I think you mean this commit

maitra commented 1 year ago

Thanks, Emacs 29.1 is expected to be released in spring 2023: perhaps this means a wait of six months. Hopefully the patch is not big enough to also make it to Emacs 28.2.

HinTak commented 1 year ago

Yes, it is just a two-line code change - should be easy enough for backport either way (patching by distro packager, or emacs 28.2). I'll ask if they could do it in 28.x too.

HinTak commented 1 year ago

It is knowing where to stick the "-Q" in, that's the hard part. The change is very small. I filed at redhat bugzilla to get it backported anyway. Other affected distribution might want to do that too, before if/when 28.2 includes the diff.

HinTak commented 1 year ago

Upstream says there is no plan for a 28.x . I already filed with redhat to get it back ported at the distro packaging level. Any non-redhat people affected by this here?