Closed masinter closed 3 years ago
John Cowan wrote
From the Smalltalk: Bits of History, Words of Advice book, in the chapter "The Design and Implementation of VAX/Smalltalk-80" by Stoney Ballard (Three Rivers) and Stephen Shirron (DEC), pp. 146-47:
The Smalltalk-80 system as distributed is not designed to either run background processes or co-exist on a timesharing system. This is due to the large number of places where the code loops waiting for a mouse button. The system can be converted to one which is entirely event driven by inserting wait messages to an "any event" semaphore into the loops. We found these loops by noticing whenever the idle process was not running, yet nothing else seemed to be happening. We would then type control-C to interrupt the Smalltalk-80 system and find out who was responsible. The debugger was then used to edit and recompile the offending methods. Converting all the interaction to an event-driven style allowed background Smalltalk-80 processes to run whenever the user was not actively interacting with the Smalltalk-80 system.
It is generally considered uncivil to run programs that are not doing anything worthwhile on a timesharing system. To fix this, we replaced the Smalltalk-80 idle process with one that called two special primitives. The Smalltalk-80 code for this is as follows. IdleLoop [true] whileTrue: [[Smalltalk collectGarbage] whileTrue. Smalltalk hibernate]
The collectGarbage primitive performed an incremental activation of the garbage collector, returning false if there was nothing left to do. The hibernate primitive suspended the Smalltalk-80 VMS process, letting other users run. The hibernate primitive returned whenever an external event happened. Since this loop runs at the lowest priority, it is preempted by any Smalltalk-80 process with something to do.
This made us more popular with the other users of the VAX, and also reduced the overhead of the garbage collector when interacting with the Smalltalk-80 system in a bursty manner (which is usually the case). The Smalltalk-80 process itself also benefited from this because the VMS scheduler assigns a lower priority to compute-bound processes. By hibernating often enough, the Smalltalk-80 process would preempt other users running compilers and the like, leading to a snappier response when browsing or editing.
There must be a more general solution to this. For example, how does Virtual Box solve this problem? Many systems run continuously but don't peg the CPU when there is no real activity.
I think the best way is to modify it to sleep when idle, rather than busy waiting. Like the Smalltalk example above.
There must be a more general solution to this. For example, how does Virtual Box solve this problem? Many systems run continuously but don't peg the CPU when there is no real activity.
Your host operating system will put the CPU to sleep when no processes are running (all are waiting for I/O), and an I/O event (which can include a timer) will wake it up automatically. The guest operating system (running in virtualbox) will do essentially the same thing to its virtual CPU. That's not something that virtualbox does; nominally, it's something the guest OS does.
Reading the Medley Release 1.0 notes suggests that the variables "BACKGROUNDFNS" and "TTYBACKGROUNDFNS" may pose a problem since they contain tasks to run when idle.
Hi, I have not touched nor reading anything and it might not work at all, but just making a new subr call to call sleep from IL:BLOCK and adjust the length of sleep time might work.
I think stopping Medley from pegging the CPU is critically important. However, I do not think adding regular sleep's is the way. I wouldn't want to make Medley less responsive or slower.
Medley should somehow know the difference between running user code (which should be unimpeded) and its normal house-keeping loop. The house-keeping loop should either have timed blocks that immediately break on user input or sleeps that are interrupted by user input.
Lastly, many systems have the same issue that had to be solved. How did they do it?
Blake McBride
On Thu, Dec 3, 2020 at 9:29 PM yasuhiko-kiuchi-fx notifications@github.com wrote:
Hi, I have not touched nor reading anything and it might not work at all, but just making a new subr call to call sleep from IL:BLOCK and adjust the length of sleep time might work.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Interlisp/medley/issues/33#issuecomment-738542440, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABRLAABJBTJD3K6AWM4ODCLSTBJQNANCNFSM4RSQV3WA .
Interlisp already has a way of deciding the machine is idle and firing up a screen saver. My memory is a little fuzzy but I think it looks for all of the threads in the scheduler waiting for user input or for another thread that was. THe idle parameters have the option of periodically doing a SaveVM, what idler screensaver to run, etc. I think just blocking for 100 MS or an X server input when idler thinks it isn't busy might be sufficient.
Some of my favorite hacks were screensavers. There's one I want to find that took a copy of the original screen bitmap and slowly displaced little segments down. I thought it kind of looked like the screen was melting. One April 1st I installed it in the PARC site-init file.
The DOS emulation community uses a tool called DOSIDLE that seems to do exactly what we're looking for. Here's the description on VMWare's site: https://www.vmware.com/support/ws3/doc/ws32_guestos11.html
I looked at DOSIDLE and it seems very DOS-specific, and might not even work for Medley on DOS. I think it comes down to, as @yasuhiko-kiuchi-fx suggested, adding a subr if there isn't one already for lowering and raising the lde priority.
I haven't read @yasuhiko-kiuchi-fx https://github.com/yasuhiko-kiuchi-fx suggestion but having written process schedulers, I can tell you that lowering the processes priority will not solve the problem. If there are no higher processes It'll still peg the CPU. Entering a wait state is the only way to do it.
--blake
On Sun, Dec 13, 2020 at 3:16 PM Larry Masinter notifications@github.com wrote:
I looked at DOSIDLE and it seems very DOS-specific, and might not even work for Medley on DOS. I think it comes down to, as @yasuhiko-kiuchi-fx https://github.com/yasuhiko-kiuchi-fx suggested, adding a subr if there isn't one already for lowering and raising the lde priority.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Interlisp/medley/issues/33#issuecomment-744069715, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABRLAADEETWPI6L3WMNLOVLSUUVJZANCNFSM4RSQV3WA .
I'd like to do an experiment. Add a subr called SLEEPSOME which calls linux nanosleep. The amount of time to sleep is 1/20th of the timer interrupt schedule (1/60th a second). Add a call to SLEEPSOME on \BACKGROUNDFNS. If every process is waiting, it will get called often. If one process is busy, it will get called less.
I'm afraid any calls to sleep will just slow the system down. I think the only real way to do it is:
I just checked Squeak and Pharo. Neither peg the CPU. I'm happy to talk to them for suggestions if you're interested.
I'm willing to help you develop a solution. But I'm not sure what's wrong with my proposal
The system uses a round-robin scheduler. A 'proccess' (thread) runs until it is waiting for a timer to expire or it is waiting for input (network, keyboard, mouse) or it gets a periodic interrupt. The system is not "really doing something" if all of the threads are waiting. The system is not "really doing something" if at least one thread in the list of threads got interrupted.
if we're moving away from signals to polling, we still need the periodic timers.
OK, I've done that experiment, with the nanosleep of 833333 (a twentieth of a sixtieth of a second) pushed onto the BACKGROUNDFNS the CPU utilization goes to 5% which is what you'd expect if nothing is going on. If you put it into an infinite compute loop it will get up to 98% (according to "top"). It's also quite non-responsive to keyboard input regardless of whether it is otherwise busy.
The branch is "experiment-yield-subr" -- it's just for experiments, it should NOT be merged into the master branch.
It does sound promising though. What problems do you see with it?
If we ensure that we have interrupt driven I/O then it will probably work reasonably well -- the nanosleep() will be interrupted when a key is pressed or the mouse is moved or file I/O completes. Putting it on the BACKGROUNDFNS is a reasonable experimental hack, it might be even better to look at the implementation of the process world in PROC and see if there's a tighter place to integrate.
If we can't rely on interrupt-driven I/O, then perhaps instead of sleeping it should be waiting a measured amount of time for I/O to happen.
Sounds very exciting to me!
after the sleep returns, set Irq_Stk_Check to 0 (sign to check interrupts) then follow by an opcode that starts with CHECK_INTERRUPT; (maybe RETURN?) That will let the keyboard stuff run
Just toss in a (SUBRCALL CAUSE-INTERRUPT) [which already exists] in the function pushed onto BACKGROUNDFNS -- that feels better.
@masinter -- let me know if it works well for you.
i don't remember how to make a subr call. let me see if this will fix my cygwin problem too.
If we can't rely on interrupt-driven I/O, then perhaps instead of sleeping it should be waiting a measured amount of time for I/O to happen.
This is "just" a matter of doing a sleep via select
or poll
or equivalent mechanisms (kqueue
, Solaris event ports, etc). We'd need a list of all file handles. We might want to, at that point, also look at using things like timerfd
on Linux. This is inline with some ideas that I've been looking into to fix some other bugs, so that'd be cool.
@waywardmonkeys -- we have the FD_SET, and it already does a select() but it's no-wait and it's not coordinated with Lisp's conception of when it's idle -- thus the thing I put in to the BACKGROUNDFNS at Larry's suggestion.
@masinter -- exactly the form (SUBRCALL CAUSE-INTERRUPT) and compile the code where that is.
Here's the function I pushed onto my BACKGROUNDFNS --
Works pretty well! Doesn't pin CPU. Even running on CYGWIN (#103)!!
betcha it will work in WSL1 even
The experiment was a success. It works because, if every Lisp process is waiting for something to happen (in a BLOCK or waiting for keyboard input) then the sleep time is >> time to poll each process. If one process is actually doing something, then the sleep time is << combined process work time. I think there's a good argument for leaving it in BACKGROUNDFNS. It isn't enough to make cygwin responsive to keyboard interrupts (control-T control-B control-D) though ... need to check for keyboard input in the 60hz interrupt.
For those impatient for the "real" fix, here's a workaround. Please report your experience!
https://github.com/Interlisp/medley/wiki/How-to-Run-Medley-without-pinning-the-CPU
For those impatient for the "real" fix, here's a workaround. Please report your experience!
https://github.com/Interlisp/medley/wiki/How-to-Run-Medley-without-pinning-the-CPU
This seems to work very well for me. CPU use has dropped from 100% (one full core) down to ~8-9% when idle. Currently running on: Intel Core i7, Linux Mint 20.1, Linux kernel 5.4.0, compiled with clang 10.0.0
I'd like to bump this issue, if I may, because I've been spending a lot of time experimenting in Medley lately, and the constant CPU fan noise is starting to bug me :) The experiment-yield-subr
branch of Maiko does seem to be working well for me, but I like to keep as up to date with the head of master as I can.
Of course, I realize there are probably more important things to be worked on, but I just wanted to keep this in people's minds. Thanks for all the hard work!
I agree with sethm. This is a valuable feature, and the fix seems to work well. There is, however, a possible difference of preference with respect to the amount of time in the sleep. I, therefore, make two recommendations as follows:
I think it's just waiting for a PR that resolves https://github.com/Interlisp/maiko/pull/343#issuecomment-777053191
The current experimental code already allows for the caller of the yield subr, in Lisp, to provide the number of nanoseconds (< 1 s) to sleep. That feature will certainly remain to provide flexibility.
I haven't received any input on a default value other than Larry's proposed 1/1200s but I'd like to hear about any experiments that have been done.
I'd also like to hear how it interacts with other timer based features such as the SPY profiling, or async I/O.
PR Interlisp/maiko#343 is open for this change but we had a request to hold off on merging.
@sethm -- I've rebased this branch onto current master (and force pushed back to github), so you're not missing anything if you use it. There haven't really been changes in maiko code that affect the operation of Lisp code except for this new SUBR implementation -- it's all been code cleanup and stripping out unused paths.
@sethm -- I've rebased this branch onto current master (and force pushed back to github), so you're not missing anything if you use it. There haven't really been changes in maiko code that affect the operation of Lisp code except for this new SUBR implementation -- it's all been code cleanup and stripping out unused paths.
Great work, thank you very much. I've given it a spin and it's working perfectly for me. I'll be patient and wait for the PR to go through when everyone is satisfied, no rush!
I'm running by default with this on, I tried changing the sleep time by doubling the amount, which reduced the usage even more. I tried SPY and got an odd result, but i didn't pursue it because we can always turn off background-yield when running spy. What else is needed for this PR to master?
Need to sort out the generation of the subr numbers -- the generated include file is missing guards.
one problem with experiment-yield-subr is that there is no way I know of from Lisp to find out whether the running maiko has the subr. If it doesn't, then the call will wind up in URAID, rather than just being ignored. @stumbo @fghalasz for the docker experiments
PR #488 adds BACKGROUND-YIELD to end of BACKGROUNDFNS (most of which are useless in Medley btw)
As many have noted (including most recently @pmcjones )
there isn't a straightforward solution because Lisp always is running, if only to track the mouse and blink the cursor, handle incoming network packets, and do other periodic polling events many times a second. But it might be possible to reduce the load if all the processes are hung in I/O or timer wait.