SvarDOS / bugz

SvarDOS bug tracker
http://svardos.org/
6 stars 0 forks source link

100% CPU usage under QEMU #116

Open mateuszviste opened 2 weeks ago

mateuszviste commented 2 weeks ago

SvarDOS under QEMU makes the qemu process run at 100%. SvarDOS loads "FDAPM adv:reg" - this helps keeping the CPU low on VirtualBox, but not QEMU apparently.

"FDAPM APMDOS" works, but it then makes keyboard input sluggish.

EIDL and DOSIDLE both work well with no immediately visible side effects.

https://github.com/SvarDOS/edrdos/issues/79 will provide a nice solution. Until then, I will replace FDAPM with EIDL. The version 1.10 is quite big so version 1.0 is preferred.

mateuszviste commented 2 weeks ago

well, I spoke too soon - with EIDL the system is significantly slower (FDAPM /? takes 2s to complete, while without EIDL it is immediate). Not really a surprise, though - Issuing a hlt at every int 28h is expected to slow down the system. I think we discussed about this some time ago with ECM, but I can't find where it was.

DOSIDLE, on the other hand, does not slow down the system (not in a visible way at least) and works very well decreasing QEMU's CPU usage. But its licensing status is unclear.

ecm-pushbx commented 2 weeks ago

well, I spoke too soon - with EIDL the system is significantly slower (FDAPM /? takes 2s to complete, while without EIDL it is immediate). Not really a surprise, though - Issuing a hlt at every int 28h is expected to slow down the system. I think we discussed about this some time ago with ECM, but I can't find where it was.

https://pushbx.org/ecm/dokuwiki/blog/pushbx/2024/0812_early_august_work?s[]=eidl#eidl

--> https://github.com/SvarDOS/edrdos/issues/78#issuecomment-2262072777

ecm-pushbx commented 2 weeks ago

I could add the HLTIDLE hooks to EIDL at the cost of a bit of memory, then add status bits to switch on or off parts. I read the page that describes it at https://pcdosretro.gitlab.io/hltidle.htm but did not look at the sources so it should be fine.

ecm-pushbx commented 2 weeks ago

Oh, you listed DOSIDLE, not HLTIDLE. My bad. But you could try HLTIDLE and see if it is any better than my last EIDL.

mateuszviste commented 2 weeks ago

Thanks for the links, I knew it was something we talked about this recently. HLTIDLE works fine with QEMU, and does not slow down character output like EIDL (both versions, 1.0 and 1.10, slow down character output a lot under my QEMU installation, running "FDAPM/?" is a good test to measure it).

BTW: the MAK.SH file included in the source archive does not work, as it includes some directories far away (../../../lmacros/), notably lmacros3.mac is missing. Also, would you consider distributing EIDL UPX-ed? upx -9 --8086 EIDL.COM shaves off 20%. It's always nice to save a few floppy sectors. :)

ecm-pushbx commented 2 weeks ago

BTW: the MAK.SH file included in the source archive does not work, as it includes some directories far away (../../../lmacros/), notably lmacros3.mac is missing.

You need to put the lmacros collection at a place where it will be found. This can be ../lmacros/ or ./ too in the mak script. The needed files are lmacros1.mac, lmacros2.mac, and lmacros3.mac from https://hg.pushbx.org/ecm/lmacros/file/e611130fc291

Also, would you consider distributing EIDL UPX-ed? upx -9 --8086 EIDL.COM shaves off 20%. It's always nice to save a few floppy sectors. :)

I think users can do that themselves. I don't like UPX much.

ecm-pushbx commented 2 weeks ago

HLTIDLE works fine with QEMU, and does not slow down character output like EIDL (both versions, 1.0 and 1.10, slow down character output a lot under my QEMU installation, running "FDAPM/?" is a good test to measure it).

Is this using EDR-DOS? Any difference when using the FreeDOS kernel?

mateuszviste commented 2 weeks ago

You need to put the lmacros collection at a place where it will be found.

Of course. Just saying that it would be nicer if the original archive contains all the pieces needed to build the tool. Distributing partial source code is a bit counter productive. :) I do not know much about your repo, but myself I do a similar thing (using my own library of routines), and to make sure these land in the distributable archive I create a symlink in the program's directory that leads to the library. This way, when I am packing the program, the library gets ziped with it as zip follows the symlink. +in sources I can use "include lib/mateusz.h" instead of some "include ../../../../mateuszlib/ver100/src/inc/mateusz.h". The lib does not even need to be on the same repo, it only matters that the symlink points at the right location on my hdd.

I think users can do that themselves.

Fair enough. Although I am usually hesitant to UPX someone's binary myself, as I have this doubt that maybe it will break the program and I won't notice it immediately. When it's done by the author it's always more comforting.

I don't like UPX much.

May I ask why? I'm not trying to get in your hair, I am genuinely curious. Do you know any nice alternatives out there? I mean beside Mr Bellard's awesome LZEXE and the boring PKLITE from MS.

Is this using EDR-DOS? Any difference when using the FreeDOS kernel?

I was using EDR-DOS, yes. I tried with the FreeDOS kernel now - there, no slowdown is noticeable. So I take EDR follows the MS-DOS way of calling int28 every nth byte of output, while FreeDOS probably do not.

---- side note ---- today I am not able to reproduce the high CPU usage with FDAPM on QEMU. I was playing with floppies yesterday, testing stuff, and the CPU usage was only a side effect that I noticed at some point. Not sure what I could have done - will try to find out.

mateuszviste commented 2 weeks ago

today I am not able to reproduce the high CPU usage with FDAPM on QEMU. I was playing with floppies yesterday, testing stuff, and the CPU usage was only a side effect that I noticed at some point. Not sure what I could have done - will try to find out.

Okay, did it again. FDAPM ADV:REG works fine in QEMU when the system is idle at the command prompt, but here I am running this code:

for (;;) {

    /* wait for a request... */

    /* if nothing received then loop for keyboard actions and signal IDLE time */
    if ((frame->len == 0) || (frame->len & 0x8000)) {
      unsigned char keywait = 0;
      _asm {
        int 0x28
        mov ah, 0x0b /* get stdin status */
        int 0x21
        mov keywait, al
      }
      if (keywait != 0) {
        puts("aborted by user");
        break;
      }
      continue;
    }
  ...
}

Attached my test floppy image server.zip. Running the above in qemu:

FDAPM ADV:REG -> 100% CPU FDAPM APMDOS -> 6% CPU but command-line input is laggy HLTIDLE -> 100% CPU DOSIDLE -> 6% CPU, command-line input is responsive EIDL -> 6% CPU, command-line input is responsive, but FDAPM/? takes 2s to be displayed when running under the EDR kernel (with the FreeDOS kernel it's fast)

mateuszviste commented 2 weeks ago

my understanding is that hooking int 28h is fine and necessary for software that emits it to signal idle time, but HLT-issuing should be done only if the frequency of these int 28h calls is close enough. DOSIDLE apparently performs some calculations before engaging its power-saving routines (ie. it needs to "warm up" before starting intensive HLTing, and stops as soon as some activity is detected).

ecm-pushbx commented 2 weeks ago

Can you try your example with the int 28h replaced by mov ax, 1680h \ int 2Fh?

ecm-pushbx commented 2 weeks ago

Moreover you may want to call 2F.1680 and if its al return is nonzero then run sti \ hlt by yourself.

mateuszviste commented 2 weeks ago

Can you try your example with the int 28h replaced by mov ax, 1680h \ int 2Fh?

This is a DPMI call. Sure it works sometimes (or even most of the time), but int 28h is much more universal (as it's used by DOS itself since ancient times). int 2F might also crash in at least some versions of DOS.

Moreover you may want to call 2F.1680 and if its al return is nonzero then run sti \ hlt by yourself.

IIRC this would make software incompatible with Windows. I am really to a fan of putting such things in "normal" programs. IMO these things should be left to specialized drivers and kernels.

ecm-pushbx commented 2 weeks ago

Can you try your example with the int 28h replaced by mov ax, 1680h \ int 2Fh?

This is a DPMI call. Sure it works sometimes (or even most of the time), but int 28h is much more universal (as it's used by DOS itself since ancient times). int 2F might also crash in at least some versions of DOS.

You can check that int 2Fh appears valid. Then use https://hg.pushbx.org/ecm/inst2d2f/file/6ac95bc66ac4/inst2d2f.asm

And the int 28h method may work at times but as you noted it may be called even when the program / DOS is not idle, whereas 2F.1680 definitely indicates the program is idle.

Moreover you may want to call 2F.1680 and if its al return is nonzero then run sti \ hlt by yourself.

IIRC this would make software incompatible with Windows.

I think MSWindows implements 2F.1680 so you would never run the hlt there.

I am really to a fan of putting such things in "normal" programs. IMO these things should be left to specialized drivers and kernels.

I think an application should idle the machine even without any special support from other software.

ecm-pushbx commented 2 weeks ago

Here's my idle function in lDebug: https://hg.pushbx.org/ecm/ldebug/file/9316c0cfe06a/source/lineio.asm#l2096

ecm-pushbx commented 2 weeks ago

Your keywait example may perform better if you put the idling after the if (keywait != 0) clause, if you expect to get input there repeatedly.

mateuszviste commented 2 weeks ago

Your keywait example may perform better if you put the idling after the if (keywait != 0) clause, if you expect to get input there repeatedly.

At this time it is just for aborting the program, so it does not matter much. Later I will add more controls and stuff, and then it will be indeed checked first (and int 28h performed only if no user actions were performed).

Can you try your example with the int 28h replaced by mov ax, 1680h \ int 2Fh?

I did it, for the sake of science:

      _asm {
        mov ax, 0x1680
        int 0x2f

        mov ah, 0x0b /* get stdin status */
        int 0x21
        mov keywait, al
      }

HLTIDLE -> 100% CPU DOSIDLE -> 11% CPU FDAPM ADV:REG -> 11% CPU EIDL -> 11% CPU

In practical terms on this qemu setup using int 0x2f "fixes" FDAPM ADV:REG and does not change anything for the rest. The CPU percentages are what top reports for my qemu process, they vary and should not be compared with my previous 6% results since now with int 28h I also get 11% of CPU usage, there are surely lots of factors to this calculations so the important thing is only to see if powersaving is in effect at all or not.

mateuszviste commented 2 weeks ago

I have also tested with POWER.EXE from MS-DOS 6.0. With this, int 0x28 leads to 100% CPU, no matter the POWER settings (ADV:MAX, ADV:REG, ADV:MIN, STD). Using int 2F/1680 leads to ~11% CPU, no matter the POWER settings.

so to sum up:

ecm-pushbx commented 2 weeks ago

I'm surprised that EIDL seems to work with 2F.1680

mateuszviste commented 2 weeks ago

I'm surprised that EIDL seems to work with 2F.1680

I was surprised at first, and re-did the check twice. It works with EDR. Not with FreeDOS. I imagine that EDR's int 0x2F handler ends up performing an int 28h. Did not investigate this, though. Surely the answer is in EDR's source code, you'd known much better than me where to look. :-)

ecm-pushbx commented 2 weeks ago

No, I traced int 2Fh with ax = 1680h and it isn't supported at all. However, a loop with just int 21h function 0Bh ends up in char_check https://hg.pushbx.org/ecm/edrdos/file/af944adc33b6/drdos/cio.nas#l890 which is called by func0B https://hg.pushbx.org/ecm/edrdos/file/af944adc33b6/drdos/cio.nas#l374 via cooked_status https://hg.pushbx.org/ecm/edrdos/file/af944adc33b6/drdos/cio.nas#l513

ecm-pushbx commented 2 weeks ago

Function 09h uses cooked_out https://hg.pushbx.org/ecm/edrdos/file/af944adc33b6/drdos/cio.nas#l754 which calls cooked_status every 80 bytes (CHECK_EVERY is 80).

mateuszviste commented 2 weeks ago

Interesting. So just polling the keyboard status is enough for DR-DOS to emit some int 28h. This means that even most power-unaware programs would end up giving up some cycles through int 28h, if only they monitor the keyboard through DOS. Cool. A good reason I guess for power-saving apps to track 28h (but in some smarter way than outputting a hlt at every int 28h).

ecm-pushbx commented 2 weeks ago

There is some hooks in the kernel for the $IDLE$ device. An example implementation of that turned up in https://www.os2museum.com/wp/idle-dr-dos/ -- I asked whether the source text as transcribed from the PDF could be provided.

mateuszviste commented 2 weeks ago

Yes I know about $IDLE$, and I did test the DRIDLE.SYS tool already. It is not very efficient in my scenario - the qemu process still takes about 50-70% of CPU in the "int 0x28 + keyb input" loop.

ecm-pushbx commented 1 week ago

You need to put the lmacros collection at a place where it will be found.

Of course. Just saying that it would be nicer if the original archive contains all the pieces needed to build the tool. Distributing partial source code is a bit counter productive. :) I do not know much about your repo, but myself I do a similar thing (using my own library of routines), and to make sure these land in the distributable archive I create a symlink in the program's directory that leads to the library. This way, when I am packing the program, the library gets ziped with it as zip follows the symlink. +in sources I can use "include lib/mateusz.h" instead of some "include ../../../../mateuszlib/ver100/src/inc/mateusz.h". The lib does not even need to be on the same repo, it only matters that the symlink points at the right location on my hdd.

Your mistake is in assuming the zipballs are the "original archive" or canonical releases. They are not. They're only builds that I provide as a convenience. The canonical sources are kept in the hg repos.

I think users can do that themselves.

Fair enough. Although I am usually hesitant to UPX someone's binary myself, as I have this doubt that maybe it will break the program and I won't notice it immediately. When it's done by the author it's always more comforting.

Understandable. lDOS dual/triple mode executables that can load as a boot loader as well must not be compressed using UPX or any other general-purpose .exe packer. Luckily, the invalid exeExtraBytes field in these binaries will have UPX reject them as input. Other applications like the TSRs RxANSI, seekext, lclock, EIDL, keephook, and applications like shufhook, callver, instsect, can be compressed without ill effect.

I don't like UPX much.

May I ask why? I'm not trying to get in your hair, I am genuinely curious. Do you know any nice alternatives out there? I mean beside Mr Bellard's awesome LZEXE and the boring PKLITE from MS.

I don't think PKLITE was from Microsoft? Theirs was called EXEPACK.

LZEXE is not free and open source software: "Source code - Currently unavailable, but may change if people ask it." I believe I did ask at some point but nothing came off it.

This is also the main problem problem that I have with UPX: The default builds include NRV, which is a closed source library. This is true even of the builds of the most recent version that FreeDOS distributes on their servers: https://www.ibiblio.org/pub/micro/pc-stuff/freedos/files/devel/upx/4.0.2/ Some distros have "upx-ucl" as free software packages instead.

Other than the "problem" problem I also think wanting to optimise disk space like this should be on a user of an application.