bochs-emu / Bochs

Bochs - Cross Platform x86 Emulator Project
https://bochs.sourceforge.io/
GNU Lesser General Public License v2.1
883 stars 104 forks source link

Crash of Bochs / OS with default Voodoo 3 drivers #131

Closed Vort closed 10 months ago

Vort commented 11 months ago

When I try to load Windows XP SP3 with default Voodoo 3 drivers, one of two things happens:

  1. Either Bochs crashes with >>PANIC<< APIC read at address 0x0000fee00081 spans 32-bit boundary !;
  2. Or OS crashes with TSS selector points to bad TSS.

This problem start to appear after updating from 620d0912646261c7e5b20fe69ef85823637abc85 to f5b54a4d336d84caf7eb744f6f97ae768c317f01. Most likely, f5b54a4d336d84caf7eb744f6f97ae768c317f01 is not direct cause of the problem, but it uncovers problem somehow.

stlintel commented 11 months ago

Hi,

Can you create clear instructions to reproduce ? I have WinXP image, what are the Voodoo drivers and how do you enable Voodoo 3 in configuration ?

Vort commented 11 months ago

I have additional information: Looks like it's a problem from trace-linking / handlers-chaining. Again. -O3 with --enable-repeat-speedups --enable-fast-function-calls works good. -O3 with --enable-all-optimizations makes problems.

Can you create clear instructions to reproduce ?

Not sure, but let's try.

what are the Voodoo drivers and how do you enable Voodoo 3 in configuration ?

They come with Windows XP SP3. So it is enough to make these changes:

vgaromimage: file="$BXSHARE\VGABIOS-lgpl-latest-banshee"
pci: enabled=1, chipset=i440fx, slot1=voodoo
vga: extension=voodoo, update_freq=30, realtime=1, ddc=builtin
voodoo: enabled=1, model=voodoo3

Start OS, wait until it installs driver automatically, then make reboot and with next boot problem will appear.

stlintel commented 11 months ago

Unfortunately unable to reproduce :( Installed drivers rebooted and works smoothly.

stlintel commented 11 months ago

Attaching my .bochsrc, can you try with my configuration or similar ?

bochsrc.txt

Vort commented 11 months ago

Unfortunately unable to reproduce :( Installed drivers rebooted and works smoothly.

Can you try with my binaries? bochs_413507ee.zip bochs_413507ee_O3.exe is bad for me. bochs_413507ee_O3_opt2.exe is good for me.

stlintel commented 11 months ago

if you claim the MOVDIRI commit 'initiates the problem' can you try to disable it partially ? This is one part of the change:

image

Try to undo it separately.

This is another:

image

Try to replace handlers with some another handler. Write BX_CPU_C::BxError instead of BX_CPU_C::MOV32_EdGdM for example. Another try - just move it to another place in the file, for example in the middle around BX_IA_SERIALIZE

stlintel commented 11 months ago

bochs_413507ee_O3.exe

Reproduced with your binaries. But I see they much smaller than mine. I configure with:

./configure --enable-sb16 \
            --enable-ne2000 \
            --enable-all-optimizations \
            --enable-cpu-level=6 \
            --enable-x86-64 \
            --enable-vmx=2 \
            --enable-avx \
            --enable-evex \
            --enable-cet \
            --enable-pci \
            --enable-clgd54xx \
            --enable-voodoo \
            --enable-usb \
            --enable-usb-ohci \
            --enable-usb-ehci \
            --enable-usb-xhci \
            --enable-busmouse \
            --enable-es1370 \
            --enable-e1000 \
            --enable-show-ips \
            --with-win32 --with-rfb --with-nogui \
            ${CONFIGURE_ARGS}

Any my binary is much bigger (5.6MB vs 4.8MB for you). So may be entire binary layout, stack and etc is different from yours. Want to try my binaries ? bochs.exe.gz

Vort commented 11 months ago

Attaching my .bochsrc, can you try with my configuration or similar ?

Same happens. Problem is in binary.

But I see they much smaller than mine.

Because of different compilers probably. Also I add change to make static builds. So I expect my binaries to be larger, not smaller.

So may be entire binary layout, stack and etc is different from yours.

Yes, I think some addresses in binary change and it reveals problems.

Want to try my binaries ?

They require bunch of DLLs to work. Which are specific to particular version of runtime / compiler. I can try if you rebuild them with static option (configure.ac from https://github.com/Vort/Bochs/commit/197a402a2c359330cbbb6fc517273a95a5030bca + --enable-static option in .conf.win32-cygwin and with autoconf launched of course). But if it works for you, then, most likely it will work for me too.

Vort commented 11 months ago

Here is -g version of binary, just in case: bochs_413507ee_O3_g.zip. It is glitchy as well.

stlintel commented 11 months ago

recompiled with --enable-static bochs.exe.gz Want to try ?

stlintel commented 11 months ago

Did you try to play with disabling parts of MOVDIRI commit as I suggested ?

Vort commented 11 months ago

recompiled with --enable-static

Something went wrong - binary is almost the same as previous one. However it looks like it is not possible to make static link with Cygwin anyway. Without static link cygwin1.dll cygstdc++-6.dll cyggcc_s-seh-1.dll files are linked. With static link cygwin1.dll is linked. I have found here description of option which can allow static link, but it does not work for me.

Did you try to play with disabling parts of MOVDIRI commit as I suggested ?

I need to think more about how to do it properly. Since problem, most likely, is not on C++ level, but on binary level.

By the way, I was able to get cygwin build, which sometimes boots fine and sometimes produce BSoD. Differences are only in time delay before I press "Start Windows Normally".

Vort commented 11 months ago

Want to try ?

I launched it with my DLLs. Not the best choice, but launch was successful this time.

No reproduction of problems with it - tried many times.

So problem depends on compiler options and random. This is one of the hardest things to chase.

Vort commented 11 months ago

I was able to produce two more binaries: bochs_413507ee_cross.zip. Both are statically linked and cross compiled from Ubuntu. bochs_413507ee_cross_gcc.exe: no bug reproduction. bochs_413507ee_cross_clang.exe: bug reproduces in the same way as with binary made in MSYS2 + clang.

vort@ubuntu:~$ /home/vort/wclang/_prefix_/bin/x86_64-w64-mingw32-clang --version
clang version 10.0.0-4ubuntu1 
Target: x86_64-w64-windows-gnu
Thread model: posix
InstalledDir: /usr/lib/llvm-10/bin
vort@ubuntu:~$ x86_64-w64-mingw32-gcc --version
x86_64-w64-mingw32-gcc (GCC) 9.3-win32 20200320
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Vort commented 11 months ago

@stlintel can you check if problem reproduces with bochs_413507ee_cross_clang14.zip?

This binary is made with fresh Clang on fresh Ubuntu. I can send disk image with build tools, but I'm not sure what is the best way to do this. Disk image is 3.88 GiB in size (1.29 GiB zipped), so I can't just attach it to this message. If you know how to use torrents, this file can be downloaded with this magnet link: magnet:?xt=urn:btih:EC299F919A70376A10DA0001159FAC7A7468882C&dn=Ubuntu22.zip&tr=udp%3a%2f%2ftracker.openbittorrent.com%3a80%2fannounce&tr=udp%3a%2f%2ftracker.opentrackr.org%3a1337%2fannounce&tr=http%3a%2f%2fretracker.local%2fannounce. Otherwise I can try to use other methods. But it is preferrable to have method which not requires registration and sending personal information.

This file was made with VirtualBox, but other VMs should be able to use disk file too - it is created in .vmdk format. To log into VM, name vort and password 123 needs to be used. Bochs code is located in directory Bochs relative to home directory. Rebuild can be started by executing make all-clean and make -j 3.

I hope that I made and described everything correctly. But in case of mistakes I can make corrections.

stlintel commented 11 months ago

It reproducing the crash, differently that previous binary you have sent but still crash. Everything looks usual until:

03397163599i[FLOPPY] controller reset in software
03397169743e[FLOPPY] io_write: 0x3f5: invalid floppy command 0x18
03397170063i[FLOPPY] perpendicular mode: config=0x80
03397170258e[FLOPPY] io_write: 0x3f5: invalid floppy command 0x18
03419263767i[MEM0  ] Memory access handlers unregistered: 0x0000c4000000 - 0x0000c4007fff
03419263767i[MEM0  ] Register memory access handlers: 0x0000c0a00000 - 0x0000c0a07fff
03419263767e[MEM0  ] Register failed: overlapping memory handlers!
03419263767i[VOODOO] new ROM address = 0xc0a00000
03419390258e[CPU0  ] interrupt(): TSS selector points to bad TSS - #GP(tss_selector)
03419390258i[CPU0  ] CPU is in protected mode (active)
03419390258i[CPU0  ] CS.mode = 32 bit
03419390258i[CPU0  ] SS.mode = 32 bit
03419390258i[CPU0  ] EFER   = 0x00000800
03419390258i[CPU0  ] | EAX=0000bb40  EBX=805430fc  ECX=00000000  EDX=9c000000
03419390258i[CPU0  ] | ESP=8040000c  EBP=80543024  ESI=80543400  EDI=80543454
03419390258i[CPU0  ] | IOPL=0 id vip vif ac vm RF nt of df if tf SF zf AF pf cf
03419390258i[CPU0  ] | SEG sltr(index|ti|rpl)     base    limit G D
03419390258i[CPU0  ] |  CS:0008( 0001| 0|  0) 00000000 ffffffff 1 1
03419390258i[CPU0  ] |  DS:0023( 0004| 0|  3) 00000000 ffffffff 1 1
03419390258i[CPU0  ] |  SS:0010( 0002| 0|  0) 00000000 ffffffff 1 1
03419390258i[CPU0  ] |  ES:0023( 0004| 0|  3) 00000000 ffffffff 1 1
03419390258i[CPU0  ] |  FS:0030( 0006| 0|  0) ffdff000 00001fff 1 1
03419390258i[CPU0  ] |  GS:0000( 0000| 0|  0) 00000000 00000000 0 0
03419390258i[CPU0  ] | EIP=80543498 (80543498)
03419390258i[CPU0  ] | CR0=0xe001003b CR2=0x803ffffc
03419390258i[CPU0  ] | CR3=0x002be000 CR4=0x000006f8
03419390258i[CPU0  ] 0x0000000080543498>> and eax, dword ptr ds:[eax] : 2300
03419390258e[CPU0  ] exception(): 3rd (13) exception with no resolution, shutdown status is 00h, resetting
stlintel commented 11 months ago

Seems like the problem is clang specific. Are you using Linux host ? Does any gcc on Linux produce failing binary ?

Vort commented 11 months ago

Seems like the problem is clang specific.

Most likely, no. I think Clang got 100% reproducibility because of (un-)lucky coincidence. And it is better to use this opportunity to track down the bug.

Old GCC shipped with Cygwin got about 20% reproducibility, but with slightly different symptoms.

Does any gcc on Linux produce failing binary ?

I tested only binary made with Linux GCC 9.3 and it is fine. However, who knows if it is really fine or it have just slightly different conditions for reproduction.


I think it is better to transfer disk image with tools to you somehow. My skills are most likely not enough to chase this bug further.

Please let me know if bittorrent method of transfer is bad for some reason and if you know better ones.

stlintel commented 11 months ago

Can you check it again on latest master repo sources ? I wonder if sync time on chains linking would fix the issue, it certainly will affect it

Vort commented 11 months ago

Can you check it again on latest master repo sources ?

I still see TSS selector points to bad TSS with a9d07b5a51c07426a2853fcef609da181d5cd5b9.

stlintel commented 11 months ago

I think it is better to transfer disk image with tools to you somehow. My skills are most likely not enough to chase this bug further.

Your disk image is not enough, I have the same problem with my disk image I just cannot get it with my binary. I need a way to build failing binary somehow to be able to look on it. BTW, do you think it might be related to voodoo3 code ? Have you seen anything similar without Voodoo3 ? Voodoo2 ? Or just with Cirrus ? If yes - might be it could be clear indication that we have some nasty buffer overflow in Voodoo3 specific.

Vort commented 11 months ago

Your disk image is not enough, I have the same problem with my disk image I just cannot get it with my binary. I need a way to build failing binary somehow to be able to look on it.

This is disk image with building tools! Everything is set up to produce bochs_413507ee_cross_clang14.exe.

Vort commented 11 months ago

Torrent is fine.

I posted magnet link earlier, theoretically it is enough. But here is torrent file just in case: Ubuntu22.zip.torrent.zip.

stlintel commented 11 months ago

Download completed all runs. The environment compiles Bochs for Windows ? Can you help with an easy way to copy files from VBox to Windows host ?

Vort commented 11 months ago

The environment compiles Bochs for Windows ?

Yes.

Can you help with an easy way to copy files from VBox to Windows host ?

I know some. But not easy.

  1. I started Http File Server on host, allowed to upload files with it, then executed curl -F fileupload1=@bochs.exe -F press="Upload files" http://10.0.2.2/
  2. It is possible to make Shared Folders and configure access to them. But I left too little free space on disk, so I'm not sure if all needed files to make it work can fit there.
  3. Bochs binary is so small that .xz compressed file can fit into virtual floppy. In case if it will be slighty larger, multi-file archive can be made. I did not tested this option.
  4. It is possible to set up SSH access. Also did not tested.
  5. Similar to 1: Links browser (sudo apt install links) can be installed and used to upload file to almost any file sharing service (local or remote).
Vort commented 11 months ago

I have some ideas about what is going on here. First of all, here are some log snippets:

Bad 1 ```text 01415573496d[VOODOO] write to r/o PCI register 0x00 ignored 01415573542d[VOODOO] write PCI register 0x04 value 0x00100003 (len=4) 01415573588d[VOODOO] write to r/o PCI register 0x08 ignored 01415573634d[VOODOO] write PCI register 0x0C value 0x00000000 (len=4) 01415573680d[VOODOO] write PCI register 0x10 value 0xC0000000 (len=4) 01415573726d[VOODOO] write PCI register 0x14 value 0xC2000008 (len=4) 01415573772d[VOODOO] write PCI register 0x18 value 0x0000C101 (len=4) 01415574002d[VOODOO] write PCI register 0x2C value 0x0036121A (len=4) 01415574048d[VOODOO] write PCI register 0x30 value 0xC0A00000 (len=4) 01415574048i[MEM0 ] Memory access handlers unregistered: 0x0000c4000000 - 0x0000c4007fff 01415574048i[MEM0 ] Register memory access handlers: 0x0000c0a00000 - 0x0000c0a07fff 01415574048e[MEM0 ] Register failed: overlapping memory handlers! 01415574048i[VOODOO] new ROM address = 0xc0a00000 01415574094d[VOODOO] write PCI register 0x34 value 0x00000060 (len=4) 01415574140d[VOODOO] write PCI register 0x38 value 0x00000000 (len=4) 01415574665d[VOODOO] write to r/o PCI register 0x00 ignored 01415574711d[VOODOO] write PCI register 0x04 value 0x00100003 (len=4) 01415574757d[VOODOO] write to r/o PCI register 0x08 ignored 01415574803d[VOODOO] write PCI register 0x0C value 0x00000000 (len=4) 01415574849d[VOODOO] write PCI register 0x10 value 0xC0000000 (len=4) 01415574895d[VOODOO] write PCI register 0x14 value 0xC2000008 (len=4) 01415574941d[VOODOO] write PCI register 0x18 value 0x0000C101 (len=4) 01415575171d[VOODOO] write PCI register 0x2C value 0x0036121A (len=4) 01415575217d[VOODOO] write PCI register 0x30 value 0xC0A00001 (len=4) 01415575263d[VOODOO] write PCI register 0x34 value 0x00000060 (len=4) 01415575309d[VOODOO] write PCI register 0x38 value 0x00000000 (len=4) 01415613429i[APIC0 ] warning: misaligned APIC access. addr=0x0000fee00081 01415613429p[APIC0 ] >>PANIC<< APIC read at address 0x0000fee00081 spans 32-bit boundary ! ```
Bad 2 ```text 01316309144d[VOODOO] write to r/o PCI register 0x00 ignored 01316309190d[VOODOO] write PCI register 0x04 value 0x00100003 (len=4) 01316309236d[VOODOO] write to r/o PCI register 0x08 ignored 01316309282d[VOODOO] write PCI register 0x0C value 0x00000000 (len=4) 01316309328d[VOODOO] write PCI register 0x10 value 0xC0000000 (len=4) 01316309374d[VOODOO] write PCI register 0x14 value 0xC2000008 (len=4) 01316309420d[VOODOO] write PCI register 0x18 value 0x0000C101 (len=4) 01316309650d[VOODOO] write PCI register 0x2C value 0x0036121A (len=4) 01316309696d[VOODOO] write PCI register 0x30 value 0xC0A00000 (len=4) 01316309696i[MEM0 ] Memory access handlers unregistered: 0x0000c4000000 - 0x0000c4007fff 01316309696i[MEM0 ] Register memory access handlers: 0x0000c0a00000 - 0x0000c0a07fff 01316309696e[MEM0 ] Register failed: overlapping memory handlers! 01316309696i[VOODOO] new ROM address = 0xc0a00000 01316309742d[VOODOO] write PCI register 0x34 value 0x00000060 (len=4) 01316309788d[VOODOO] write PCI register 0x38 value 0x00000000 (len=4) 01316310313d[VOODOO] write to r/o PCI register 0x00 ignored 01316310359d[VOODOO] write PCI register 0x04 value 0x00100003 (len=4) 01316310405d[VOODOO] write to r/o PCI register 0x08 ignored 01316310451d[VOODOO] write PCI register 0x0C value 0x00000000 (len=4) 01316310497d[VOODOO] write PCI register 0x10 value 0xC0000000 (len=4) 01316310543d[VOODOO] write PCI register 0x14 value 0xC2000008 (len=4) 01316310589d[VOODOO] write PCI register 0x18 value 0x0000C101 (len=4) 01316310819d[VOODOO] write PCI register 0x2C value 0x0036121A (len=4) 01316310865d[VOODOO] write PCI register 0x30 value 0xC0A00001 (len=4) 01316310911d[VOODOO] write PCI register 0x34 value 0x00000060 (len=4) 01316310957d[VOODOO] write PCI register 0x38 value 0x00000000 (len=4) 01316454656e[CPU0 ] interrupt(): TSS selector points to bad TSS - #GP(tss_selector) ```
Good ```text 01327612300d[VOODOO] write to r/o PCI register 0x00 ignored 01327612346d[VOODOO] write PCI register 0x04 value 0x00100003 (len=4) 01327612392d[VOODOO] write to r/o PCI register 0x08 ignored 01327612438d[VOODOO] write PCI register 0x0C value 0x00000000 (len=4) 01327612484d[VOODOO] write PCI register 0x10 value 0xC0000000 (len=4) 01327612530d[VOODOO] write PCI register 0x14 value 0xC2000008 (len=4) 01327612576d[VOODOO] write PCI register 0x18 value 0x0000C101 (len=4) 01327612806d[VOODOO] write PCI register 0x2C value 0x0036121A (len=4) 01327612852d[VOODOO] write PCI register 0x30 value 0xC0A00000 (len=4) 01327612852i[MEM0 ] Memory access handlers unregistered: 0x0000c4000000 - 0x0000c4007fff 01327612852i[MEM0 ] Register memory access handlers: 0x0000c0a00000 - 0x0000c0a07fff 01327612852e[MEM0 ] Register failed: overlapping memory handlers! 01327612852i[VOODOO] new ROM address = 0xc0a00000 01327612898d[VOODOO] write PCI register 0x34 value 0x00000060 (len=4) 01327612944d[VOODOO] write PCI register 0x38 value 0x00000000 (len=4) 01327613469d[VOODOO] write to r/o PCI register 0x00 ignored 01327613515d[VOODOO] write PCI register 0x04 value 0x00100003 (len=4) 01327613561d[VOODOO] write to r/o PCI register 0x08 ignored 01327613607d[VOODOO] write PCI register 0x0C value 0x00000000 (len=4) 01327613653d[VOODOO] write PCI register 0x10 value 0xC0000000 (len=4) 01327613699d[VOODOO] write PCI register 0x14 value 0xC2000008 (len=4) 01327613745d[VOODOO] write PCI register 0x18 value 0x0000C101 (len=4) 01327613975d[VOODOO] write PCI register 0x2C value 0x0036121A (len=4) 01327614021d[VOODOO] write PCI register 0x30 value 0xC0A00001 (len=4) 01327614067d[VOODOO] write PCI register 0x34 value 0x00000060 (len=4) 01327614113d[VOODOO] write PCI register 0x38 value 0x00000000 (len=4) 01327920891d[VOODOO] write to r/o PCI register 0x00 ignored 01327920937d[VOODOO] write PCI register 0x04 value 0x00100003 (len=4) 01327920983d[VOODOO] write to r/o PCI register 0x08 ignored 01327921029d[VOODOO] write PCI register 0x0C value 0x00000000 (len=4) 01327921075d[VOODOO] write PCI register 0x10 value 0xC0000000 (len=4) 01327921121d[VOODOO] write PCI register 0x14 value 0xC2000008 (len=4) 01327921167d[VOODOO] write PCI register 0x18 value 0x0000C101 (len=4) 01327921397d[VOODOO] write PCI register 0x2C value 0x0036121A (len=4) 01327921443d[VOODOO] write PCI register 0x30 value 0xC4000000 (len=4) 01327921443i[MEM0 ] Memory access handlers unregistered: 0x0000c0a00000 - 0x0000c0a07fff 01327921443i[MEM0 ] Register memory access handlers: 0x0000c4000000 - 0x0000c4007fff 01327921443i[VOODOO] new ROM address = 0xc4000000 01327921489d[VOODOO] write PCI register 0x34 value 0x00000060 (len=4) 01327921535d[VOODOO] write PCI register 0x38 value 0x00000000 (len=4) 01327922830d[VOODOO] banshee read from offset 0x78 (vidSerialParallelPort) result = 0x0f780000 01327922830d[VOODOO] banshee write to offset 0x78: value = 0x0f7c0000 len=4 (vidSerialParallelPort) 01327923869d[VOODOO] banshee read from offset 0x78 (vidSerialParallelPort) result = 0x0f7c0000 01327923869d[VOODOO] banshee write to offset 0x78: value = 0x0f7c0000 len=4 (vidSerialParallelPort) ```

Because problems start happening after [VOODOO] write PCI register 0x30 value 0xC0A00001 (len=4), I decided to check if removing effects of such action can help. And yes, it helps:

diff --git a/bochs/iodev/devices.cc b/bochs/iodev/devices.cc
index 6abf3fd99..17a35f83c 100644
--- a/bochs/iodev/devices.cc
+++ b/bochs/iodev/devices.cc
@@ -1688,6 +1688,8 @@ void bx_pci_device_c::pci_write_handler_common(Bit8u address, Bit32u value, unsi
   } else if ((address & 0xfc) == 0x30) {
     BX_DEBUG_PCI_WRITE(address, value, io_len);
     value &= (0xfffffc01 >> ((address & 0x03) * 8));
+   if (value != 0xc0a00001)
+   {
     for (unsigned i=0; i<io_len; i++) {
       value8 = (value >> (i*8)) & 0xff;
       oldval = pci_conf[address+i];
@@ -1701,6 +1703,7 @@ void bx_pci_device_c::pci_write_handler_common(Bit8u address, Bit32u value, unsi
         BX_INFO(("new ROM address = 0x%08x", pci_rom_address));
       }
     }
+   }
   } else if (address == 0x3c) {
     value8 = (Bit8u)value;
     if (value8 != pci_conf[0x3c]) { 

Of course, it is just hack, but it may help to find correct solution.

Another thing I noticed:

01327612852i[MEM0  ] Register memory access handlers: 0x0000c0a00000 - 0x0000c0a07fff
01327612852e[MEM0  ] Register failed: overlapping memory handlers!
...
01327921443i[MEM0  ] Memory access handlers unregistered: 0x0000c0a00000 - 0x0000c0a07fff

Registering failed, but memory region unregistered nevertheless. Looks like a source of future problems.

Vort commented 10 months ago

I have new information about this problem: It looks like misaligned APIC access is caused by corruption of BX_NIL_REGISTER (bx_cpu.gen_reg[19]). It should always have value of 0, right? But sometimes 1 or 2 gets there.

When I tried to find where corruption happens, IDE pointed me to BX_NEXT_INSTR(i); inside of BX_CPU_C::CMP_EdIdM. Stack at that location looks suspicious:

BX_CPU_C::CMP_EdIdM arith32.cc:474
BX_CPU_C::linkTrace cpu.cc:310
BX_CPU_C::linkTrace cpu.cc:310
... (total amount of linkTraces is 999) ...
BX_CPU_C::linkTrace cpu.cc:310
BX_CPU_C::linkTrace cpu.cc:310
BX_CPU_C::cpu_loop cpu.cc:112
bx_begin_simulation main.cc:1062
win32_ci_callback win32config.cc:772
bx_real_sim_c::configuration_interface siminterface.cc:905
bxmain main.cc:337
__tmainCRTStartup 0x000000013ffa1316
mainCRTStartup 0x000000013ffa1366

It suggests that problem really may be with optimizations, as I suspected earlier.

Vort commented 10 months ago

In misaligned APIC access case this panic triggers. Sadly, TSS selector problem have some different mechanism. But they may be related anyway. Root cause should be found.

diff --git a/bochs/cpu/apic.cc b/bochs/cpu/apic.cc
index 67468cbd6..5086feca2 100644
--- a/bochs/cpu/apic.cc
+++ b/bochs/cpu/apic.cc
@@ -910,6 +910,10 @@ Bit8u bx_local_apic_c::acknowledge_int(void)

   BX_ASSERT(get_vector(irr, vector));
   BX_DEBUG(("acknowledge_int() returning vector 0x%02x", vector));
+  
+  if (BX_READ_32BIT_REG(BX_NIL_REGISTER))
+    BX_PANIC(("Non-zero BX_NIL_REGISTER"));
+  
   clear_vector(irr, vector);
   set_vector(isr, vector);
   if(bx_dbg.apic) { 
Vort commented 10 months ago

In BX_CPU_C::CMP_EdIdM, BX_NEXT_INSTR(i); gets executed, which includes BX_CPU_THIS_PTR icount++;. However, register holding pointer to bx_cpu becomes corrupted. For example, in my case in rbx should be 0x14031A640, but instead 0x14031A600 appeared. 0x40 difference corresponds to write to bx_cpu.gen_reg[19] instead of bx_cpu.icount. Probably functions used earlier (BX_CPU_C::read_virtual_checks or BX_CPU_C::read_linear_dword) corrupts pointer.

Vort commented 10 months ago

I figured out what happens. Memory corruption occurs because of ROM reads with len = 3. Such accesses were not supported and treated as accesses with len = 4, overwriting 1 extra byte on stack.

169 should fix this problem.

bx_banshee_c::mem_read banshee.cc:838
bx_banshee_c::mem_read_handler banshee.cc:804
BX_MEM_C::readPhysicalPage memory.cc:210
BX_CPU_C::access_read_linear paging.cc:2759
BX_CPU_C::read_linear_dword access2.cc:423
BX_CPU_C::read_virtual_dword access.h:281
BX_CPU_C::CMP_EdIdM arith32.cc:468
BX_CPU_C::linkTrace cpu.cc:310
stlintel commented 10 months ago

Did you fix resolved the issue ?

Vort commented 10 months ago

Did you fix resolved the issue ?

Yes, no more crashes with len = 3 access implemented.