Open mikrosk opened 1 year ago
Ironically, after fixing that I tried also the correct setup and everything seemed to work: https://atari-forum.com/viewtopic.php?p=435414#p435414
OK, at least this mystery has been solved: it's really important that it is NVDI 4.x that's installed. @wongck68 had been using NVDI 4.11 when the XaAES crashes started to occur and XaAES was booting just fine with NVDI 5: https://groups.google.com/g/aranym/c/iyeFnaoSMeU/m/goacblKtAgAJ. However he didn't want to use NVDI 5 ("NVDI 5 has issues with non-NVDI printer drivers") so that's why I kept focusing on the 4.11 and fixed that. If I tested NVDI 5 with the same setup, I would have seen the same QED crash for sure.
I was able to replicate both scenarios:
So it's a NVDI5 issue specifically and it's not a regression of any kind.
Not happening on TOS 4.04: same setup (an IDE image in fact) works with TOS 4.04 (QED starts), when ROM changed to EmuTOS (in aranym's config file), QED doesn't work again.
Thanks to @czietz there has been some progress: this is not issue specific to EmuTOS. The reason why it doesn't crash in TOS 4.04 is that TOS, when booted in Aranym, it patched, i.e. all of its 030 MMU code is disabled, essentially booting Aranym without memory protection.
As soon as one executes set_mmu.prg
from AUTO folder before fvdi.prg
, qed crashes, too.
According to Christian's findings, fvdi crashes somewhere in https://github.com/freemint/fvdi/blob/master/fvdi/engine/utility.c.
More precisely, the MEMORY VIOLATION crash occurs here:
[010082b0] 0839 0000 0106 33ea btst #0,$010633EA
[010082b8] 677c beq.s $01008336
[010082ba] 4a79 0106 1ed2 tst.w $01061ED2
[010082c0] 6726 beq.s $010082E8
[010082c2] 2079 0106 334c movea.l $0106334C,a0
[010082c8] b0fc 0000 cmpa.w #$0000,a0
[010082cc] 6700 00be beq $0100838C
[010082d0] 2890 move.l (a0),(a4)
[010082d2] 2948 0004 move.l a0,4(a4)
[010082d6] 70fe moveq.l #-2,d0
[010082d8] c090 and.l (a0),d0
[010082da] 2240 movea.l d0,a1
[010082dc] 234c 0004 move.l a4,4(a1) <===== CRASH
[010082e0] 200c move.l a4,d0
[010082e2] 7201 moveq.l #1,d1
[010082e4] 8081 or.l d1,d0
[010082e6] 2080 move.l d0,(a0)
This code sequence is within FVDI.PRG, but I did not investigate far enough to point to the exact line of code.
With Christian's nudge I was able to uncover it a bit more as well. The crash happens exactly here: https://github.com/freemint/fvdi/blob/c19b1b0c6429c2d4cd0aa94ab002b56cb9883c94/fvdi/engine/utility.c#L782
As a proof see my custom-compiled fvdi.prg with debug symbols:
000085f8 <.L743>:
85f8: 2079 0001 6da8 moveal 16da8 <_mblocks>,%a0
85fe: 200a movel %a2,%d0
8600: 7201 moveq #1,%d1
8602: 8081 orl %d1,%d0
8604: 4a88 tstl %a0
8606: 6700 00b6 beqw 86be <.L728>
860a: 2490 movel %a0@,%a2@
860c: 2548 0004 movel %a0,%a2@(4)
8610: 72fe moveq #-2,%d1
8612: c290 andl %a0@,%d1
8614: 2241 moveal %d1,%a1
8616: 234a 0004 movel %a2,%a1@(4) <---- crash
861a: 2080 movel %d0,%a0@
0000861c <.L727>:
861c: 254d 0008 movel %a5,%a2@(8)
8620: 254c 000c movel %a4,%a2@(12)
8624: d4fc 000c addaw #12,%a2
00008628 <.L718>:
8628: 200a movel %a2,%d0
862a: 4cdf 3c1c moveml %sp@+,%d2-%d4/%a2-%a5
862e: 4e75 rts
Some context:
pid 9 (qed): [1][ PROCESS "qed" KILLED: | MEMORY VIOLATION. (PID 009) | Type: private PC: 01008726 | Addr: 01370004 BP: 015A4000 |
pid 9 (qed): signal #10 raised [syscall_pc 0x1008726, exception_pc 0x1008726]
Registers immediately before crash:
A0: 01018134 A4: 00003fc0 D0: 01406001 D4: 01128b70 USP=0161a3b4
A1: 01370000 A5: 00003fcc D1: 01370000 D5: 015f2472 ISP=04e365aa
A2: 01406000 A6: 04e3662a D2: 00000008 D6: 015f5de8 MSP=00000000
A3: 00006814 A7: 04e365aa D3: 00000003 D7: 015f1a9c VBR=00000000
T=00 S=1 M=0 X=0 N=0 Z=0 V=0 C=0 TC=8000
CACR=80008000 DTT0=00000000 ITT0=00000000 SRP=013e8000 SFC=000
CAAR=00000000 DTT1=00000000 ITT1=00000000 URP=013e8000 DFC=000
FP0: -nan FP1: -nan FP2: -nan FP3: -nan N=0 Z=0
FP4: -nan FP5: -nan FP6: -nan FP7: -nan I=0 NAN=0
[01008722] 234a 0004 move.l a2,4(a1)
next PC: 01008726
I'm not sure what's the deal with those | 1
(there's a few other occasions of using it across the engine), it would seem that it assumes that Mxalloc
never returns an odd number and therefore the bit 0 is used as some kind of flag ((a0)
does point to 0x01370001
).
I put some debug outputs into around Mxalloc
and Mfree
, it doesn't seem that the affected pointer has been released. When crashing at accessing 0x01488004
(different address due to debug stuff), I can see that
Allocation at $01488000, 16320 bytes
Malloc at $0148800c
Splitting
Allocating at $0148800c (next at $01489fec)
6: 0/0 0/0 2/0 7/0 1/1 0/1 0/0 3/0 9/1 0/0
and there is no Mfree
with any such address so I don't know. Maybe some of the diagnostic functions in utility.c
can be of use. Maybe @mfro0 , @th-otto or @vinriviere know more?
fVDI documentation states:
Then there is
nvdifix
infvdi.sys
which somehow tries to move NVDI in the trap chain so it's not first:but its description is not very encouraging:
FreeMiNT's
INSTALL
file says:So this would imply that the best supported combination of those three is:
booted
set andnvdifix
not set)However this means that both NVDI and FVDI are outside of FreeMiNT's control. If I understand it correctly, they even use TOS supervisor stack (right?) and are not listed as separate processes. This seems to create a strange issue.
Repro steps:
Now start QED (/opt/GEM/qed/qed.app). Depending on the actual setup, sometimes it runs at the first time. Quit it. Run again. It wont run again. Sometimes it wouldn't run even on the first try, sometimes TosWin2 wouldn't run.
If started from TosWin2, one can see a simple "Bus error" message. When mint.prg compiled with debug infos, one can see in Aranym's console:
When looking where exactly it crashes, it goes all the way to gemlib's
v_opnvwk
. It wouldn't return from the trap. FreeMiNT's trap 2 handler is not even called in this case (i.e. it's not a bug related to freemint), fVDI's https://github.com/freemint/fvdi/blob/c19b1b0c6429c2d4cd0aa94ab002b56cb9883c94/fvdi/engine/workstn.c#L167seems to successfully enter and exit so it must be crashing somewhere in NVDI's trap handler. It seems to depend also on the resolution, if I set
fvdi.sys
with 640x480x8, it doesn't crash, if 640x480x16, it does.It's very hard to debug as I can't find a way to reproduce this with anything else than aranym.sys.
Strangely, when following https://atari-forum.com/viewtopic.php?p=247471#p247471, i.e. all the possible recommendations thrown away:
nvdifix
enabledPrinting and vector fonts not only work but also the crashing
v_opnvwk
is gone.Could be the issue caused by a stack overflow in NVDI.PRG ? Symptoms seems to be very similar to https://github.com/freemint/freemint/pull/269 however when I started investigating this one, I had the 'wrong' setup (NVDI.PRG after MINT.PRG, FVDI.PRG in
mint.cnf
withnvdifix
enabled) so the stack was wrong on a different place. Ironically, after fixing that I tried also the correct setup and everything seemed to work: https://atari-forum.com/viewtopic.php?p=435414#p435414 ... so the problem can be easily hidden as it seems.