Interlisp / medley

The main repo for the Medley Interlisp project. Wiki, Issues are here. Other repositories include maiko (the VM implementation) and Interlisp.github.io (web site sources)
https://Interlisp.org
MIT License
378 stars 19 forks source link

maiko always pinning CPU #33

Closed masinter closed 3 years ago

masinter commented 4 years ago

As many have noted (including most recently @pmcjones )

the MacBook fan was running full-blast because of the busy-waiting emulator

there isn't a straightforward solution because Lisp always is running, if only to track the mouse and blink the cursor, handle incoming network packets, and do other periodic polling events many times a second. But it might be possible to reduce the load if all the processes are hung in I/O or timer wait.

masinter commented 4 years ago

John Cowan wrote

From the Smalltalk: Bits of History, Words of Advice book, in the chapter "The Design and Implementation of VAX/Smalltalk-80" by Stoney Ballard (Three Rivers) and Stephen Shirron (DEC), pp. 146-47:

The Smalltalk-80 system as distributed is not designed to either run background processes or co-exist on a timesharing system. This is due to the large number of places where the code loops waiting for a mouse button. The system can be converted to one which is entirely event driven by inserting wait messages to an "any event" semaphore into the loops. We found these loops by noticing whenever the idle process was not running, yet nothing else seemed to be happening. We would then type control-C to interrupt the Smalltalk-80 system and find out who was responsible. The debugger was then used to edit and recompile the offending methods. Converting all the interaction to an event-driven style allowed background Smalltalk-80 processes to run whenever the user was not actively interacting with the Smalltalk-80 system.

It is generally considered uncivil to run programs that are not doing anything worthwhile on a timesharing system. To fix this, we replaced the Smalltalk-80 idle process with one that called two special primitives. The Smalltalk-80 code for this is as follows. IdleLoop [true] whileTrue: [[Smalltalk collectGarbage] whileTrue. Smalltalk hibernate]

The collectGarbage primitive performed an incremental activation of the garbage collector, returning false if there was nothing left to do. The hibernate primitive suspended the Smalltalk-80 VMS process, letting other users run. The hibernate primitive returned whenever an external event happened. Since this loop runs at the lowest priority, it is preempted by any Smalltalk-80 process with something to do.

This made us more popular with the other users of the VAX, and also reduced the overhead of the garbage collector when interacting with the Smalltalk-80 system in a bursty manner (which is usually the case). The Smalltalk-80 process itself also benefited from this because the VMS scheduler assigns a lower priority to compute-bound processes. By hibernating often enough, the Smalltalk-80 process would preempt other users running compilers and the like, leading to a snappier response when browsing or editing.

blakemcbride commented 4 years ago

There must be a more general solution to this. For example, how does Virtual Box solve this problem? Many systems run continuously but don't peg the CPU when there is no real activity.

Kirtai commented 4 years ago

I think the best way is to modify it to sleep when idle, rather than busy waiting. Like the Smalltalk example above.

moon-chilled commented 4 years ago

There must be a more general solution to this. For example, how does Virtual Box solve this problem? Many systems run continuously but don't peg the CPU when there is no real activity.

Your host operating system will put the CPU to sleep when no processes are running (all are waiting for I/O), and an I/O event (which can include a timer) will wake it up automatically. The guest operating system (running in virtualbox) will do essentially the same thing to its virtual CPU. That's not something that virtualbox does; nominally, it's something the guest OS does.

Kirtai commented 4 years ago

Reading the Medley Release 1.0 notes suggests that the variables "BACKGROUNDFNS" and "TTYBACKGROUNDFNS" may pose a problem since they contain tasks to run when idle.

yasuhiko-kiuchi commented 3 years ago

Hi, I have not touched nor reading anything and it might not work at all, but just making a new subr call to call sleep from IL:BLOCK and adjust the length of sleep time might work.

blakemcbride commented 3 years ago

I think stopping Medley from pegging the CPU is critically important. However, I do not think adding regular sleep's is the way. I wouldn't want to make Medley less responsive or slower.

Medley should somehow know the difference between running user code (which should be unimpeded) and its normal house-keeping loop. The house-keeping loop should either have timed blocks that immediately break on user input or sleeps that are interrupted by user input.

Lastly, many systems have the same issue that had to be solved. How did they do it?

Blake McBride

On Thu, Dec 3, 2020 at 9:29 PM yasuhiko-kiuchi-fx notifications@github.com wrote:

Hi, I have not touched nor reading anything and it might not work at all, but just making a new subr call to call sleep from IL:BLOCK and adjust the length of sleep time might work.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Interlisp/medley/issues/33#issuecomment-738542440, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABRLAABJBTJD3K6AWM4ODCLSTBJQNANCNFSM4RSQV3WA .

masinter commented 3 years ago

Interlisp already has a way of deciding the machine is idle and firing up a screen saver. My memory is a little fuzzy but I think it looks for all of the threads in the scheduler waiting for user input or for another thread that was. THe idle parameters have the option of periodically doing a SaveVM, what idler screensaver to run, etc. I think just blocking for 100 MS or an X server input when idler thinks it isn't busy might be sufficient.

Some of my favorite hacks were screensavers. There's one I want to find that took a copy of the original screen bitmap and slowly displaced little segments down. I thought it kind of looked like the screen was melting. One April 1st I installed it in the PARC site-init file.

Anzus commented 3 years ago

The DOS emulation community uses a tool called DOSIDLE that seems to do exactly what we're looking for. Here's the description on VMWare's site: https://www.vmware.com/support/ws3/doc/ws32_guestos11.html

masinter commented 3 years ago

I looked at DOSIDLE and it seems very DOS-specific, and might not even work for Medley on DOS. I think it comes down to, as @yasuhiko-kiuchi-fx suggested, adding a subr if there isn't one already for lowering and raising the lde priority.

blakemcbride commented 3 years ago

I haven't read @yasuhiko-kiuchi-fx https://github.com/yasuhiko-kiuchi-fx suggestion but having written process schedulers, I can tell you that lowering the processes priority will not solve the problem. If there are no higher processes It'll still peg the CPU. Entering a wait state is the only way to do it.

--blake

On Sun, Dec 13, 2020 at 3:16 PM Larry Masinter notifications@github.com wrote:

I looked at DOSIDLE and it seems very DOS-specific, and might not even work for Medley on DOS. I think it comes down to, as @yasuhiko-kiuchi-fx https://github.com/yasuhiko-kiuchi-fx suggested, adding a subr if there isn't one already for lowering and raising the lde priority.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Interlisp/medley/issues/33#issuecomment-744069715, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABRLAADEETWPI6L3WMNLOVLSUUVJZANCNFSM4RSQV3WA .

masinter commented 3 years ago

I'd like to do an experiment. Add a subr called SLEEPSOME which calls linux nanosleep. The amount of time to sleep is 1/20th of the timer interrupt schedule (1/60th a second). Add a call to SLEEPSOME on \BACKGROUNDFNS. If every process is waiting, it will get called often. If one process is busy, it will get called less.

blakemcbride commented 3 years ago

I'm afraid any calls to sleep will just slow the system down. I think the only real way to do it is:

  1. Somehow know the difference between when the system is really doing something and when it's not.
  2. When the system is not doing anything, sleep until an event.
blakemcbride commented 3 years ago

I just checked Squeak and Pharo. Neither peg the CPU. I'm happy to talk to them for suggestions if you're interested.

masinter commented 3 years ago

I'm willing to help you develop a solution. But I'm not sure what's wrong with my proposal

  1. Somehow know the difference between when the system is really doing something and when it's not

The system uses a round-robin scheduler. A 'proccess' (thread) runs until it is waiting for a timer to expire or it is waiting for input (network, keyboard, mouse) or it gets a periodic interrupt. The system is not "really doing something" if all of the threads are waiting. The system is not "really doing something" if at least one thread in the list of threads got interrupted.

  1. When the system is not doing anything, sleep until an event.

if we're moving away from signals to polling, we still need the periodic timers.

nbriggs commented 3 years ago

OK, I've done that experiment, with the nanosleep of 833333 (a twentieth of a sixtieth of a second) pushed onto the BACKGROUNDFNS the CPU utilization goes to 5% which is what you'd expect if nothing is going on. If you put it into an infinite compute loop it will get up to 98% (according to "top"). It's also quite non-responsive to keyboard input regardless of whether it is otherwise busy.

nbriggs commented 3 years ago

The branch is "experiment-yield-subr" -- it's just for experiments, it should NOT be merged into the master branch.

blakemcbride commented 3 years ago

It does sound promising though. What problems do you see with it?

nbriggs commented 3 years ago

If we ensure that we have interrupt driven I/O then it will probably work reasonably well -- the nanosleep() will be interrupted when a key is pressed or the mouse is moved or file I/O completes. Putting it on the BACKGROUNDFNS is a reasonable experimental hack, it might be even better to look at the implementation of the process world in PROC and see if there's a tighter place to integrate.

If we can't rely on interrupt-driven I/O, then perhaps instead of sleeping it should be waiting a measured amount of time for I/O to happen.

blakemcbride commented 3 years ago

Sounds very exciting to me!

masinter commented 3 years ago

after the sleep returns, set Irq_Stk_Check to 0 (sign to check interrupts) then follow by an opcode that starts with CHECK_INTERRUPT; (maybe RETURN?) That will let the keyboard stuff run

nbriggs commented 3 years ago

Just toss in a (SUBRCALL CAUSE-INTERRUPT) [which already exists] in the function pushed onto BACKGROUNDFNS -- that feels better.

nbriggs commented 3 years ago

@masinter -- let me know if it works well for you.

masinter commented 3 years ago

i don't remember how to make a subr call. let me see if this will fix my cygwin problem too.

waywardmonkeys commented 3 years ago

If we can't rely on interrupt-driven I/O, then perhaps instead of sleeping it should be waiting a measured amount of time for I/O to happen.

This is "just" a matter of doing a sleep via select or poll or equivalent mechanisms (kqueue, Solaris event ports, etc). We'd need a list of all file handles. We might want to, at that point, also look at using things like timerfd on Linux. This is inline with some ideas that I've been looking into to fix some other bugs, so that'd be cool.

nbriggs commented 3 years ago

@waywardmonkeys -- we have the FD_SET, and it already does a select() but it's no-wait and it's not coordinated with Lisp's conception of when it's idle -- thus the thing I put in to the BACKGROUNDFNS at Larry's suggestion.

@masinter -- exactly the form (SUBRCALL CAUSE-INTERRUPT) and compile the code where that is.

nbriggs commented 3 years ago

Here's the function I pushed onto my BACKGROUNDFNS --

Screen Shot 2021-01-06 at 9 31 37 PM

masinter commented 3 years ago

Works pretty well! Doesn't pin CPU. Even running on CYGWIN (#103)!!

betcha it will work in WSL1 even

masinter commented 3 years ago

The experiment was a success. It works because, if every Lisp process is waiting for something to happen (in a BLOCK or waiting for keyboard input) then the sleep time is >> time to poll each process. If one process is actually doing something, then the sleep time is << combined process work time. I think there's a good argument for leaving it in BACKGROUNDFNS. It isn't enough to make cygwin responsive to keyboard interrupts (control-T control-B control-D) though ... need to check for keyboard input in the 60hz interrupt.

masinter commented 3 years ago

For those impatient for the "real" fix, here's a workaround. Please report your experience!

https://github.com/Interlisp/medley/wiki/How-to-Run-Medley-without-pinning-the-CPU

sethm commented 3 years ago

For those impatient for the "real" fix, here's a workaround. Please report your experience!

https://github.com/Interlisp/medley/wiki/How-to-Run-Medley-without-pinning-the-CPU

This seems to work very well for me. CPU use has dropped from 100% (one full core) down to ~8-9% when idle. Currently running on: Intel Core i7, Linux Mint 20.1, Linux kernel 5.4.0, compiled with clang 10.0.0

sethm commented 3 years ago

I'd like to bump this issue, if I may, because I've been spending a lot of time experimenting in Medley lately, and the constant CPU fan noise is starting to bug me :) The experiment-yield-subr branch of Maiko does seem to be working well for me, but I like to keep as up to date with the head of master as I can.

Of course, I realize there are probably more important things to be worked on, but I just wanted to keep this in people's minds. Thanks for all the hard work!

blakemcbride commented 3 years ago

I agree with sethm. This is a valuable feature, and the fix seems to work well. There is, however, a possible difference of preference with respect to the amount of time in the sleep. I, therefore, make two recommendations as follows:

  1. Make the change part of the main line of the system
  2. Although the system will default to some reasonable sleep amount, add a simple way for the user to change this value.
masinter commented 3 years ago

I think it's just waiting for a PR that resolves https://github.com/Interlisp/maiko/pull/343#issuecomment-777053191

nbriggs commented 3 years ago

The current experimental code already allows for the caller of the yield subr, in Lisp, to provide the number of nanoseconds (< 1 s) to sleep. That feature will certainly remain to provide flexibility.

I haven't received any input on a default value other than Larry's proposed 1/1200s but I'd like to hear about any experiments that have been done.

I'd also like to hear how it interacts with other timer based features such as the SPY profiling, or async I/O.

PR Interlisp/maiko#343 is open for this change but we had a request to hold off on merging.

nbriggs commented 3 years ago

@sethm -- I've rebased this branch onto current master (and force pushed back to github), so you're not missing anything if you use it. There haven't really been changes in maiko code that affect the operation of Lisp code except for this new SUBR implementation -- it's all been code cleanup and stripping out unused paths.

sethm commented 3 years ago

@sethm -- I've rebased this branch onto current master (and force pushed back to github), so you're not missing anything if you use it. There haven't really been changes in maiko code that affect the operation of Lisp code except for this new SUBR implementation -- it's all been code cleanup and stripping out unused paths.

Great work, thank you very much. I've given it a spin and it's working perfectly for me. I'll be patient and wait for the PR to go through when everyone is satisfied, no rush!

masinter commented 3 years ago

I'm running by default with this on, I tried changing the sleep time by doubling the amount, which reduced the usage even more. I tried SPY and got an odd result, but i didn't pursue it because we can always turn off background-yield when running spy. What else is needed for this PR to master?

nbriggs commented 3 years ago

Need to sort out the generation of the subr numbers -- the generated include file is missing guards.

masinter commented 3 years ago

one problem with experiment-yield-subr is that there is no way I know of from Lisp to find out whether the running maiko has the subr. If it doesn't, then the call will wind up in URAID, rather than just being ignored. @stumbo @fghalasz for the docker experiments

masinter commented 3 years ago

PR #488 adds BACKGROUND-YIELD to end of BACKGROUNDFNS (most of which are useless in Medley btw)