Closed dirkwhoffmann closed 5 years ago
The blitter draws the complete picture (Workbench initial screen) with multiple single draw operations as far as I read somewhere. I think you will have to implement at least the blitter logic for it then. That would mean the first goal is already set pretty high ambitious ...... nice goal dirk !! I totaly like this logbook story format ;-) ...
You are right. Although V0.1 won't do much from a users perspective, a lot under-the-hood stuff needs to work to make the image appear.
Right now, I'm trying to find out how the DMA time slot allocation is implemented in SAE (the Javascript UAE clone):
It must be a core piece of the emulator, but I didn't find it yet. What I did find is two event tables: eventtab and eventtab2.
The first one covers four events:
SAEC_Events_EV_CIA
SAEC_Events_EV_HSYNC
SAEC_Events_EV_AUDIO
EV_MISC
The second one covers two:
SAEC_Events_EV2_BLITTER
SAEC_Events_EV2_DISK
I guess the Disk, Blitter and Audio DMA cycles are implemented within the event handlers. Unfortunately, I didn't find the code fragments that implement Sprite and Bitplane DMA. It is quite difficult to crawl through the UAE or SAE code because there are hardly any comments and the code is far from self-explanatory.
I attached a crisper image because in your pic the slots for 68k,blitter, copper are hard to distinguish from the 320 mode bitplane DMA.
taken from bloodline blog where he explains how he would implement the DMA sequencer (watch out for Theory time) http://eab.abime.net/showthread.php?t=90316&page=9
detailing information about DMA sequence http://amigadev.elowar.com/read/ADCD_2.1/Hardware_Manual_guide/node012B.html
Hmm, in SAE there is a lot of stuff going on in functions hsync_handler_pre() and hsync_handler_post(). It seems like SAE is not cycle-accurate, but line-accurate 🤔. I'm a little bit confused right now.
I found something in custom.js the function DMACON(v, hpos) looks maybe like a dispatcher for DMA slots... it looks like it starts copper actions, bitplane_DMA and blitteroperation depending on hpos of a horizontal line like documented in the hardware ref manual...
what is this readmap and writemap thing? it seems like a function map depending sort of depending on hpos?
in winuae it is different, there in custom.cpp is a function "static int dma_cycle (void)" which maybe seems to do the dma sequencing ? Or am I wrong...
DMACON and DMACONR seem to be the read and write handlers for the DMACON register. They only occur in the read and write maps for the OCS registers:
readMap[0x002 >> 1] = DMACONR;
writeMap[0x096 >> 1] = DMACON;
Thanks a lot for the links to the DMA sequencer forum thread. Might be very useful for us!
I've found this in SAE:
this.events_dmal_hsync = function() {
...
SAER.events.event2_newevent_xx(-1, 7 * SAEC_Events_CYCLE_UNIT, 13, function(v) {
while (dmal) {
if (dmal & 3)
dmal_emu(dmal_hpos + ((dmal & 2) ? 1 : 0));
dmal_hpos += 2;
dmal >>>= 2;
}
});
}
This is called after every HSYNC. It schedules an event that runs dmal_emu in a loop. dmal_emu performs disk or audio DMA, depending on the horizontal position. This means that the CPU cannot interfere because the DMA is emulated in a single chunk. This indicates line-accuracy, but I always thought that UAE is cycle-accurate.
I'm wondering how bitplane data is read in...
More findings:
An HSYNC event calls hsync_handler_pre()
which calls finish_decisions()
which calls decide_sprites()
which calls record_sprite()
which process sprite data for all pixels in one row (or fewer pixels if it is called in the middle of a rasterline).
Interestingly, decide_sprites()
is also called in the write handler SPRxPTH
. This indicates that UAE / SAE uses the following design principle:
At least it seems to work this way for sprites.
I'm pretty much in favour of emulating DMA access as it is explained in the Raperry Pie Forum thread (VICII in VirtualC64 works the same way). It would be slower, but it seems simpler and less error-prone.
Raspberry Pie Forum ?
you refer to
English Amiga Board > Coders > Coders. Asm / Hardware > Baremetal Amiga Emulator don't you ? This is no raspi-forum, it is the "English Amiga Board" an resource of loads of information around the Amiga including WinUAE, FSUAE, Coding, Software, Hardware, etc . ;-)
I totally agree with you, I also think the way which bloodline is going is my favourite. It is simpler, cleaner approach and maybe a bit slower. I bet that decisions stuff came from a time where computers where much slower and could barely handle cycle exact Amiga Emulation, so they had to optimise and pay that with complicated code. (of course that optimisation is still reasonable for javascript which is slow, but for clang compiled code not necessary anymore I bet)
The CIAs are connected to memory now, so we are able to process a few more instructions. We are here at the moment:
; Set up port A on the first CIA (8520-A).
FC00FE move.b #3,BFE201 Set low two bits for output.
FC0106 move.b #2,BFE001 Set boot ROM off, power light dim.
The first move instruction configures pins PA0 and PA1 of CIAA as output and the second move instruction sets PA0 to 0 and PA1 to 1. PA0 controls the Kickstart overlay, and writing a 0 means that the Kickstart should no longer be overlayed. When the two instructions are executed, the memory panel shows that this is indeed the case. We now see the Chip Ram blended in:
PA1 controls the LED. The LED is switched off, but because the LED was already switched off before, we don’t see anything of interest.
The next important instructions are:
FC0118 move.w D0,$9A(A4) Disable all interrupts.
FC011C move.w D0,$9C(A4) Clear all pending interrupts.
FC0120 move.w D0,$96(A4) Disable all DMA.
The first two statement reset the interrupt registers which can be verified in the Paula panel:
I agree that it’s not that spectacular, because 0 is also the initial value on startup. However, a temporary debug message in the console tells me that the registers have indeed been written to:
Paula: pokeINTENA(7FFF)
Paula: pokeINTREQ(7FFF)
The next instruction produces
Memory: WARNING: pokeCustom16(DFF096, 7FFF): MISSING IMPLEMENTATION
So it’s clear what to implement next: The DMACON register inside Agnus…
Now I can step through until I reach the Expansion RAM Checker at
FC061A move.l A0,A4
Good to have a commented Kickstart around:
; $C00000 Expansion RAM Checker
; -----------------------------
; The following routine checks for the presence of memory
; in the $C00000 - $DC0000 area. This is a nontrivial exercise,
; since if there is no memory there, we see images of the custom
; chip registers there instead, due to incomplete address decoding.
Incomplete address decoding 😳. Never heard about it 🤭. If I'm right, UAE handles unmapped memory via the "dummy" bank handlers ... Let's see what they do there ...
I've found a Verilog reimplementation of Gary here (Amiga FPGA project):
https://github.com/rkrajnc/minimig-de1/blob/master/rtl/minimig/Gary.v
According to the line
assign sel_reg = cpu_address_in[23:21]==3'b110 ? ~(sel_xram | sel_rtc | sel_ide | sel_gayle) : 1'b0;
"incomplete address encoding" means that Gary selects the custom registers if the upper three address bits match. I've adapted this and the new memory mapping now looks like this (A500 with 512 KB slow mem, some fast mem and a RTC attached):
Now the emulator detects correctly if a Chip Ram extension is present (Slow Ram starting at memory bank C0). If memory is found, it is initialised with zeroes 🥳.
Next step will be:
; Having figured out the end address of expansion memory (in A4),
; and the value to use for ExecBase (in A6), we now check how much
; chip memory we have. Any memory in the first 2 megabytes of
; address space is considered to be chip memory. Less than 256K
; of chip memory is considered a fatal error.
FC0208 lea 0,A0 Start looking at location 0.
FC020C lea 200000,A1 Don't look past 2 megabytes.
FC0212 lea FC021A(PC),A5 Set the return address.
FC0216 bra FC0592 Go check the memory.
FC021A cmp.l #$040000,A3 Do we have at least 256K of chip memory?
FC0220 bcs.s FC0238 Bomb if not.
Unfortunately the emulator tends to beach ball if the inspector is open while the main window is in the background. This a Mac related problem and due to the fact that I'm not really familiar with handling auxiliary windows in OS X. I might look into this first before I continue here ...
After removing a stupid bug in Memory::poke32(), Kickstart has decided that the machine has 256 KB of Chip Ram (which is good, because it has, well, 256 KB of Chip Ram). Then, it recognised that the CPU is a 68000 (by ruling out that it is a 68010 etc.). After that a lot of memory init stuff is done (setting up exec jump tables etc.). This all looks all good (as far as I can judge this at the moment), so I'm finally here:
; A historic moment: We turn the supervisor mode flag off.
FC04BE and.w #0,SR Turn the supervisor bit off.
Wow, a historic moment 🤭. I am a bit scared of what will happen. OK, let's be brave and press the Step button again 😬:
Woohoo, supervisor flag is cleared 😀. After experiencing this "historic moment", I need a break. Stepping through Kickstart is exhausting ...
Just noticed that the "Data" column is wrong in the CPU panel. Need to fix this first ...
Thats an completely epic moment for all of us 🖖 ! 🤗
That is the documented kickstart exec of markus wandel. Isn’t it? He wrote that comments in February 3, 1989 so thats clearly a historic moment in February 2019. 🙃
From this time on, we are leaving supervisor mode and running in 68K user mode... certain 68k commands like "stop" or "reset" do not work from this moment on... that makes sense because AmigaOS is a multitasking system ...
I continued my journey through Kickstart. Unfortunately, I wasn't aware of the fact that Markus Wandel "only" documented exec, so at some point, the unavoidable happened: I left exec and entered the undocumented area. This means I'm in outer space now and completely left on my own 👽😬. I kept on stepping and at some point in time, the emulator started to poke values into the Copper registers. So it was about time to work on that.
To make a long story short: I don't have a working Copper yet, but I do have a Copper disassembler 😎. Along the way, I've also invented a new software development approach which I'm gonna call "inverse prioritising". It's core idea is to postpone the most import things as much as possible. I'm so proud of this method that I'm considering to publish a book about it. The only thing that puzzles me is that nobody else had this brilliant idea before 🤔.
Anyways, at the moment, the Copper disassembler looks like this:
While working on the disassembler, I was looking for some standard Copper assembly notation, but I didn't find any. I therefore invented my own. If it turns out that there is some kind of established notation (which I am not aware of, because I spent so much time on the C64 that the Amiga is brand new technology for me), I can easily change that.
Although I’m still in the design phase, I have done considerable progress: Important design decision are going to emerge. The first major decision was to move from a mixed event/polling-based design to a truly event-based design.
A major part of the emulator is the DMA controller. The heart of the DMA controller is the event scheduler which consists of several event slots. From a theoretical point of view, each event slot is a single state machine with timed transitions. Right now, there are 5 slots (meaning we have 5 state machines running in parallel):
Slot 1: CIA A
Slot 2: CIA B
Slot 3: Disk, Audio, Sprite, and Bitplane DMA
Slot 4: Copper
Slot 5: Blitter
Slot 6: Rasterline (HSYNC events)
To give an example, let’s look at slot 3. In each HSYNC event, a slot-3-event is scheduled that triggers at the first horizontal beam position where DMA happens. If Disk DMA in enabled, this will be position 7. Once the event is served, the next DMA event is scheduled. Although this sounds simple to implement, it is not. The challenge here is to find out when the next DMA event happens for a given hpos. This is dependent on a lot of factors (DMA enable bits, lores / hires mode, vblank area etc.). To implement this efficiently, I decided to use a precomputed DMA event table. Whenever one of the influencing factors changes (e.g., the vblank area is entered), a DMA time slot allocation table is computed which resembles Fig. 6.9. in the Hardware Reference Manual.
Let’s test this out with the current prototype. If we enable Disk DMA, Sprite DMA, Audio DMA for channel 1 and 2 and bitplane DMA in the DMA inspector panel, the event table looks like this (Denise has three bitplanes enables and runs in lowres modes):
00000000000000001111111111111111222222222222222233333333333333334444444444444444
0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF
.......D.D.D...A.A...S.S.S.S.S.S.S.S.S.S.S.S.S.S.S.S.......L.L.L...L.L.L...L.L.L
.......I.I.I...1.2...0.0.1.1.2.2.3.3.4.4.5.5.6.6.7.7.......2.3.1...2.3.1...2.3.1
To speed things up, the emulator computes a jump table in addition to the event table. For each hpos value, the jump tables indicates the hpos where the next event happens. E.g., at position 0x33, the jump table contains the value 0x3B which is a L2 (lowres bitplane 2 fetch event).
I hope I haven't overseen any theoretical flaws in this design.
Now I’m at a point where the Copper list has been initialized. To examine the list, open the Copper inspector panel:
The first list makes sense to me. The Copper is programmed to restore the initial values of the bitplane pointer and the sprite pointer registers. After that, it waits until a certain beam position is reached and jumps to the second Copper list. The second list seems to be uninitialised yet, because the commands make no sense. Commands in red indicate an illegal command. Illegal commands are MOVE commands accessing custom registers Copper has no access to.
It’ll be interesting to see how Copper executes these commands. To see it, I would have to step to the point where Copper DMA is switched on. Unfortunately, I cannot step there yet, because the emulator stops at FCADBA with an error message:
Memory: WARNING: pokeCustom16(DFF040 [BLTCON0], 0): MISSING IMPLEMENTATION
Amiga: Pause
The Kickstart code writes into one of the Blitter registers, so it seems the right time to start working on this component ...
I’ve read through the Hardware Reference Manual. Bottom line is that the Blitter is easy from a functional point of view, but difficult if exact timing is taken into account. As stated in the HRM, we have to deal with varying time slice patterns (depending on the enabled DMA channels):
I have reviewed a couple of existing Blitter implementations (with UAE the most cryptic one again) and I came to the conclusion that I want to try something new here. I’m going to control my virtual Blitter via emulated micro instructions.
More precisely: When the BLTSIZE register is written to (which starts a blit), the emulator will analyse the current DMA configuration and set up a micro instruction list. After that, the event scheduler will be programmed to trigger Blitter events and each event will then execute a single micro instruction.
Here is an example instruction list for the first Blitter configuration in Table 6-2.
case 0b1111: { // A0 B0 C0 -- A1 B1 C1 D0 A2 B2 C2 D1 D2
uint16_t prog[] = {
FETCH_A,
FETCH_B | HOLD_A,
FETCH_C | HOLD_B,
HOLD_D,
FETCH_A,
FETCH_B | HOLD_A,
FETCH_C | HOLD_B,
WRITE_D | HOLD_D | LOOPBACK3,
WRITE_D
};
memcpy(microInstr, prog, sizeof(prog));
break;
}
The micro instructions allow me to emulate the data flow in the real Blitter quite accurately (The Blitter is designed in form of a traditional pipeline with “hold” register forming an intermediate pipeline stage).
Although this approach sounds promising to me (because of it’s flexibility), I’m totally unsure if this is the right way to go. Time will tell…
Now, I'm at
FCADC2: move.w #$41, ($58,A4)
This writes a value into BLTSIZE which means that a Blitter operation is about to come 😬.
Let's step over it ...
FATAL ERROR: Unimplemented Blitter configuration
Kickstart is starting the Blitter with the BLTCON = 0 config in Table 6-2. OK, there is no micro code for that config yet... I thought it's stupid to use this configuration and now it's the first one being used 🙈.
OK, now the Blitter has some microcode for its most meaningless mode. I also tweaked the debug output a little, so it's easier to see what's going on internally:
Master cycles CPU cycles DMA cycles CIA cycles
Master clock: 59359184 14839796 7419898 1483979
DMA clock: 59359184 14839796 7419898 1483979
Frame clock: 59303296 14825824 7412912 1482582
CIA A clock: 360 90 45 9
CIA B clock: 360 90 45 9
Color clock: (61,176) hex: ($3D,$B0) Frame: 208
CIA A: Event: CIA_WAKEUP Trigger: disabled
CIA B: Event: CIA_WAKEUP Trigger: disabled
DMA: Event: DMA_DISK Trigger: 59399616 (5054 DMA cycles away)
Copper: Event: none Trigger: disabled
Blitter: Event: BLT_EXECUTE Trigger: 59359128 (-7 DMA cycles away)
Raster: Event: RAS_HSYNC Trigger: 59359584 (50 DMA cycles away)
Woohoo, for the first time there is pending message in the Blitter slot 🥳. But wait ... it is overdue since 7 cycles ... this should never happen 😖.
A brief update about what’s going on in the OCS family.
Finally Paula got her own interrupt scheduler. It's a rather sophisticated device that allows her to trigger interrupts in certain cycles, e.g. in five cycles from now on, with little computational overhead. She hasn't really used it so far because there were simply no IRQ requests. I told her to be patient a little while longer as this is going to change soon.
Denise became jealous because she is the component with the fewest lines of code yet. Because her sister got this super cool interrupt scheduler, she now insists on getting the pixel engine I promised her some time ago. I told her we had to debug Copper first, because a pixel engine without a Copper is pretty useless. Somehow I felt she wasn't really listening.
Agnus is quite happy with his event scheduler. He says that planning events is much more fun than polling regularly. Unfortunately, it's still not an easy task for him. He continues to plan events with invalid time stamps and the like, but I am pretty confident that he will improve that over time. He also likes to be the one in charge of the bus. At first he had the idea to exclude his little sisters from the bus and keep all cycles for himself. I tried to convince him that this was not possible. We have to follow the rules, namely the DMA time slot allocation as stated in the hardware reference manual. I'm not sure he really understood what I meant, because he still does strange things from time to time.
Besides my struggle with the OCS chips, I continued stepping through kickstart to a point where the real trouble begins 😬:
As you can seen, Kickstart has enabled all kinds of DMA now. The rest of the story can be told rather quickly. When Copper saw his DMA flag set, he run off like crazy 🤪, scheduled some weird events and crashed the whole thing 🙈. Well, as I said above: I need to debug Copper first 😖.
After fixing some bugs, it’s time to give Copper another chance. The fun starts when Copper's DMA flag is set:
Copper: (0,5): COP_REQUEST_DMA
Copper: (0,7): COP_MOVE: coppc = 422 copins2 = 0
DMAController: pokeBPL0PTH(0)
Looks good so far ... the first MOVE command has been executed 😎.
Copper: (0,79): COP_WAIT_OR_SKIP: coppc = 46A copins2 = FFFE
That’s the
WAIT* ($0C,$00)
command. Let’s check what kind of effect that had ...
Primary event table:
Slot: Copper Event: COP_FETCH Trigger: 100061600 (2645 DMA cycles away)
Good news here, Copper went idle. Now the question is if he's going to wake up exactly at the specified beam position 🤔…
Copper: (12,0): COP_FETCH: coppc = 46C copins1 = 8A
😃 Yeah, it continues at (12,0).
The next command is the MOVE command writing into the Strobe register. This is going to redirect us to the second Copper list.
Copper: (12,2): COP_MOVE: coppc = 46E copins2 = 0.
Copper: pokeCOPJMP2
The second list consists of a single command. It’s a WAIT statement then never triggers and therefore disables the Copper.
WAIT* ($FF,$FE)
Let's keep our fingers crossed 🤞...
Copper: (12,4): COP_FETCH: coppc = 474 copins1 = FFFF
Copper: (12,6): COP_WAIT_OR_SKIP: coppc = 476 copins2 = FFFE
So, let’s check our event list. The Copper slot should be disabled by now:
Slot: Copper Event: WAIT_OR_SKIP Trigger: never
Pretty cool 🥳. Copper successfully processed his first list.
Time for a brief update. After providing each custom chip with some basic functionality (Agnus schedules events, Denise does DMA, Paula triggers interrupts), the OCS kids seem to be happy with what they have (expect Denise who is still angry, because she didn’t get a pixel engine yet ). The problem is that the custom chips run in an endless loop now. I expected them to draw the hand & disk picture eventually, but they don't seem to care about what I want 🙁.
Because endless loops are hard to debug, I decided to work on some missing stuff with the hope that one of it is the cause for the infinite loop.
One of these things is drive identification. Hence, my current goal is to let the internal drives identify themselves correctly as 3.5” DD drives. Fortunately, the identification happens in documented Kickstart land:
; DRT_AMIGA EQU $00000000 ; standard 3.5" DD Amiga disk
; DRT_37422D2S EQU $55555555 ; 5.25"
; DRT_150RPM EQU $AAAAAAAA ; 3.5" HD drive with HD disk in
; DRT_EMPTY EQU $FFFFFFFF ; empty drive
; note that values returned by drive are negation of what is saved
; by disk.resource
FC48F4 move.l #0,$30(A2) ; we always have df0 of Amiga type
FC48FC moveq #2,D2
FC48FE move.b #$10,D3 ; SEL1
FC4902 lea $34(A2),A3
FC4906 bsr.l $FC491C ; check the driveid for the other 3 units
FC490A lsl.b #1,D3 ; next unit
FC490C dbra D2,$FC4906
FC4910 move.l A2,A1
FC4912 jsr -$01E6(A6) ; AddResource
FC4916 movem.l (SP)+,D2/D3/A2/A3/A6
FC491A rts
FC491C not.b D3 ; inverts select bit
FC491E lea $BFD100,A0 ; CIA-B prb
FC4924 move.b #$7F,D0 ; prepares motor on
FC4928 move.b D0,(A0)
FC492A and.b D3,D0 ; motor on for selected drive
FC492C move.b D0,(A0)
FC492E move.b #$FF,(A0) ; deselect drive
FC4932 move.b D3,(A0) ; motor off
FC4934 move.b #$FF,(A0) ; deselect drive - this resets drive shift id port
FC4938 moveq #$1F,D1 ; loop 32x
FC493A moveq #0,D0
FC493C lsl.l #1,D0
FC493E move.b D3,(A0) ; select drive
FC4940 btst #5,$BFE001 ; check the /RDY bit
FC4948 beq.s $FC494E
FC494A bset #0,D0 ; if not set particular bit
FC494E move.b #$FF,(A0) ; deselect drive
FC4952 dbra D1,$FC493C ; process all 32 bits
FC4956 move.l D0,(A3)+ ; store result in DR_UNITID
FC4958 not.b D3 ; turn back select bit
FC495A rts
This is also the place where we could tell Kickstart we had an HD drive 😎. $AAAAAAAA is the secret passphrase.
OK, I can now transfer any 32-bit drive identification key over the RDY line serially. This is nice, but pretty useless at the moment. Why? Because Kickstart knows that df0 is always a standard drive and therefore skips the serial transmission step for it:
FC48F4 move.l #0,$30(A2) ; we always have df0 of Amiga type
The mechanism only becomes important, when external drives come into play.
Seems like I have to come up with another idea to tackle my infinite loop problem 🤔.
Some news about the hand & disk screen hunt.
My goal is to reach memory location FC570E. This is where the BLTSIZE register is written to with non-trivial values and the emulator is supposed to blow up there (it’s supposed to blow up, because there is no Blitter micro code for non-trivial blits, but that’s another story and has been done purposely).
By stepping back manually through the Omega CPU trace log, I was able to identify the following memory location sequence. This is the result:
fe89b0 : Bitplane DMA switched off
fe89c6
fe89da
fe8a06
fe8b88
fe8bbe
fe8c3e
fe8c46
fe8c70
fe8cca
fe8cd0
fe8cd6
fe8ce4
fe8ce8
fc55c8
fc570e : Bltsize is written to
Between those addresses, a lot of sub routine stuff is going on.
The good news is that vAmiga already reaches FE89B0. This is where Bitplane DMA is switched off. Hence, it remains to check where in this sequence vAmiga gets lost.
I am just doing the same back stepping in omega
spotted first blitsize at
fc5654: lea $dff000.l, A0
...
fc570e: move.w D0, ($58,A0) <--- first write to blit size at fc570e
which corresponds perfectly to dirks spotted address
a short window out of the full instruction log of omega shortly before blitter action follows here
...
fe8d68: bra fe8cf8
fe8cf8: moveq #$0, D3
...
fe8d62: movea.l A3, A1
fe8d64: jsr (-$f6,A6) <-- CPU jsr to $2108 that means a6=$2108+$f6
2108: jmp $fc55c8.l <------ PC is at adress $2108 and CPU jumps to $fc55c8 (no jsr!)
fc55c8: tst.l (A1)
fc55ca: bne fc6834
fc55ce: movem.l D2-D7/A2-A3, -(A7)
fc55d2: movem.w ($24,A1), D2-D3
fc55d8: movem.w D0-D1, ($24,A1)
...
fc5654: lea $dff000.l, A0 <--- CPU loads custom chip register base into A0, be prepared to expect that something impressive will going on here
fc565a: movea.w D2, A3
fc565c: move.w ($22,A1), D6
fc5660: addq.w #1, ($aa,A6)
fc5664: beq fc566a
fc566a: btst #$6, ($2,A0)
fc5670: btst #$6, ($2,A0)
fc5676: beq fc567c
fc567c: move.w D1, ($62,A0)
fc5680: move.w D2, D1
fc5682: sub.w D0, D1
fc5684: move.w D1, ($64,A0)
fc5688: moveq #-$1, D1
fc568a: move.l D1, ($44,A0)
fc568e: move.w #$8000, ($74,A0)
fc5694: move.w (A2), ($60,A0)
fc5698: move.b ($1f,A1), D1
fc569c: swap D1
fc569e: clr.w D1
fc56a0: asr.l #4, D1
fc56a2: or.w D1, D7
fc56a4: sub.b D0, ($1f,A1)
fc56a8: move.w D7, D5
fc56aa: addq.w #1, D0
fc56ac: asl.w #6, D0
fc56ae: addq.w #2, D0
fc56b0: move.w D4, D2
fc56b2: swap D4
fc56b4: asr.l #4, D4
fc56b6: ori.w #$b00, D4
fc56ba: clr.w D1
fc56bc: bclr #$0, ($21,A1)
fc56c2: bne fc56cc
fc56c4: cmpi.b #$2, ($1c,A1)
fc56ca: beq fc572a
fc56cc: move.b ($5,A2), D2
fc56d0: lea ($8,A2), A2
fc56d4: move.l (A2)+, D7
fc56d6: btst D1, ($18,A1)
fc56da: beq fc5712
fc56dc: swap D5
fc56de: move.w D4, D5
fc56e0: move.b ($28,A1,D1.w), D5
fc56e4: swap D5
fc56e6: add.l D3, D7
fc56e8: btst #$6, ($2,A0)
fc56ee: btst #$6, ($2,A0)
fc56f4: beq fc56fa
fc56fa: move.l D5, ($40,A0)
fc56fe: move.w A3, ($52,A0)
fc5702: move.l D7, ($48,A0)
fc5706: move.l D7, ($54,A0)
fc570a: move.w D6, ($72,A0)
fc570e: move.w D0, ($58,A0) <--- first write to blit size at fc570e
fc5712: addq.b #1, D1
-----> the omega Blitter does draw a line <---
dirk said vAmiga is currently at this address
fe89b0: move.w #$100, $dff096.l --> this is the 2462237th omega instruction since start
vAmigas CPU still has to process 26129 CPU instructions that is only 1% of all instructions processed so far...
fc570e: move.w D0, ($58,A0) <--- first write to blit size, which is omegas 2488366th instruction since start
vAmiga has already taken 99% of the route to the hand and disk drawing 😀 ...
Kickstart v1.2 full instruction trace log until hand drawing code (executed by omega)...
e.g. from the very first instruction until instruction 3129644 where the hand and disk image is drawn...
Here’s the thing. After vAmiga reaches fe89b0, it eventually executes fc0716 (and so does Omega).
The first comparison is false, so it does not branch. This means that the jsr (-$13e,A6) is taken (same in Omega). After returning, it jumps to the comparison statement again. In Omega, the comparison is now true, but in vAmiga it’s still false. The second jsr (-$13e,A6) never returns.
There is more than one function with offset -$13e:
ReadPixel (graphics.library)
UnGetC (dos.library)
Wait (exec.library)
This bug is a nightmare!
Dirk it is exec.library. Look at the content of a6 and for reference start sysinfo in omega to see the libbase adresses. From there you will see $676 the value of a6 is execbase.
I dumped the registers after each instruction. Omega calls exec.wait() with:
E fc071e: 4eae fec2 : jsr (-$13e,A6)
D0 = 80000000 D1 = 1F D2 = 0 D3 = 0 D4 = 0 D5 = 0 D6 = FFFFFFFF D7 = 0
A0 = 1916 A1 = 18E6 A2 = 18E6 A3 = FE8B3A A4 = 5B88 A5 = 18BA A6 = 676 A7 = 18B2
Unfortunately, vAmiga's registers are completely different at the first call to exec.wait():
Hence, the real cause of the issue must have happened in one of the trillion lines executed before 😟.
but at your first picture the parameter d0 = 80000000 in vAmiga had the same value as in omega
look here
was it the second call then ?
be cool. ;-) Maybe the state in omega is also not as that correct as an real Amiga would be.
Yes, the first picture shows the second call.
Seems like we manoeuvred ourselves into a dead end here. Only 26129 CPU cycles prior to the finish line 🙁.
yes so close ... 🥺
Why are the values of the data and address registers so different ? Strange...
Time for another update...
As you already know, I am hunting this little bastard (aka the "hand & disk is not drawn bug") for weeks now...
I was already close to surrender when things got personal ... something between me and him ...
To make a long story short: Perseverance pays off
Now you want to know who this little bastard is: It's the Blitter busy flag in the DMACON register.
The Blitter deletes this flag at the moment it starts to flush the pipeline. Flushing is initiated by the LOOPBACK micro command in my implementation. Unfortunately, the first (strange) blit operation (which has all DMA disabled), has no LOOPBACK command, so the blit busy flag never got deleted.
Yeah, it's definitely a disk. The emulator is ready, let's ship it 😂.
Wow it is beautiful. The blitter inside Agnus has done this, right? Look at the clean drawn edges of the floppy disk. Look at the colors. That is inspiring... Green, red and blue... Ooh no, the colors are wrong, the OCS Kids used the wrong colors ... and why did they suddenly stop drawing ? Looks like they do quarrel again...?
The strange colours are my fault. Because Denise just started her drawing lessens, I decided to withhold the original palette from her. For practicing, I gave her four basic pencils only. A black one, a red, a green, and a blue one.
However, I need to have a serious word with Copper. In the middle of each frame, he constantly takes away alls her pencils. So mean ☹️.
This is vAmiga WE (Warhol Edition):
Interestingly, all text items are broken (most likely some Blitter issues).
Denise just figured out the fake pen thing. Before she goes mad, I better give her the real ones 😬:
😎
I have to admit that I cheated a little bit. I've shamelessly copied over the line Blitter stuff from the Omega emulator 🙄. The copy Blitter stuff is original vAmiga though.
The next step will be to get the texture dimensions right. The emulator is still using the original texture drawing stuff from VirtualC64.
A brief update:
Firstly, the screen buffer size has been changed to 768 x 288. Secondly, the bitplane DMA has been decoupled from the drawing code. There are separate events for bitplane DMA and pixel synthesis now. This makes the design very flexible, although the exact timing is still wrong for sure.
I've done a brief comparison of screen geometries:
Left is Omega, middle is vAmiga, right is an UAE clone (presumably PAL).
Don't get confused with the vAmiga picture. For debugging, the emulator is currently displaying the whole 1024 x 512 GPU texture. The blue area is unused texture area. The orange area contains a debug pattern (yellow and red stripes). This area is writable by the emulator, but hasn't been written to.
As you can see, Omega has a smaller lower border which is most likely due to NTSC emulation. The vAmiga geometry (PAL) looks roughly the same as the picture to the right, so I think I am on the right track...
The first draft of the GPU pipeline architecture has been completed and implemented. Details are here:
https://github.com/dirkwhoffmann/vAmiga/wiki/GPU
Using the new pipeline, the current output looks like this:
I've also managed to port the 2x upscaler from VirtualC64 to vAmiga. Using 2x upscaling, the picture is indeed a lot smoother:
I don't plan to support 4x upscaling at the moment (as in VirtualC64), because it would require a very large internal texture size of 4096 x 4096. For the C64, 2048 x 2048 was sufficient.
There is still a long way to go to V0.1, because the whole thing is still pretty unstable.
I just reworked the graphics pipeline (because the 2x upscaler had a bug) and did notice that 4096 x 4096 textures don't seem to be an issue for modern GPUs. Hence, 4x upscaling will be supported. Here is the result:
Original Amiga texture:
2x upscaling (EPX algorithm):
4x upscaling (xBr algorithm):
Before I continue, I need to enrich the emulator with more debugging capabilities. Pending events can now be watched in the new "Events" inspector panel:
Hmmm, when enabling all graphics effects (i.e., Gaussian blur), GPU performance on my (not so old) MacBook Pro goes down to 40 fps. Seems like a final texture size of 4096 x 4096 stresses the GPU too much. Maybe it's better to go with a final texture size of 2048 x 2048 (which requires the 4x upscaler to be removed 😢).
Now as I thought about it a little longer, we can still achieve 4 x upscaling with a 2048 x 2048 texture, at leat in lores mode. In lores mode, each pixel has size 4 x 4, so we can apply an upscaling algorithm "inside" the original texture. In hires mode, we can upscale at least vertically, because the even and odd lines are the same. The only mode that can only be upscaled 2x is hires+interlaced, but this mode is rarely used anyway.
Back in the game at 60 fps. 2x upscaling, Gaussian blur, Trinitron dot mask + electron beam misalignment 😎:
Now, as EPX and xBr both do 2x scaling, they can be compared directly (first = original, second = EPX upscaled, third = xBr upscaled)
Hmm, when looking upclose at the xBr image, it looks like there is a bug in the xBr implementation. The line contains strange jagged edges. If this is a bug, it's also contained in VirtualC64 🤔.
It's a bug. I've just converted a JavaScript xBr implementation to Metal to compare the result. The upper picture shows how it is supposed to look like and the lower picture is the current GPU implementation.
I am going to investigate this first, because it also affects VirtualC64. (I cannot simply replace the old implementation, because the JavaScript port is not GPU optimised and thus comparably slow.)
I've experimented a little with a two-phase upscaling pipeline. The first upscaler works "inside" the emulator texture to enhance lores images. The second upscaler is the already implemented one. The result (EPX in-texture upscaling + xBr) looks pretty promising:
The problem here is that picture quality decreases in hires mode, so we have to take care of usability. Maybe, I can enhance the upscaler to detect lores fragments automatically and only apply the first upscaling step to the lowres parts.
I am going to use this thread as a logbook in the near future to document the progress towards version 0.1. V0.1 should have about the same functionality as UAE 0.1. We'll then have a (completely useless) emulator that can do nothing but show the Workbench initial screen. To speak with a picture:
This is the current situation:
We have
So let’s see how far we can get with this. These are the first lines of Kickstart 1.2 which I want to step through:
Before we can get started, we need to install the Kickstart Rom. This is done in the hardware preferences. By default, the Aros replacement Rom is installed.
It can be replaced by an original Rom via drag & drop, so why stick to the clone if we can have the real stuff 😎:
The Kickstart Rom is usually located in the upper memory area. On startup, the Amiga mirrors it in the lower memory banks to enable the CPU to find the correct start vector. The memory inspector shows the details:
When powering on the Amiga, the CPU loads the start vector from the mirrored Kickstart Rom and jumps to address FC00D2. For testing purposes I let the emulator stop at FC00DE at a predefined breakpoint which can be watched in the CPU panel:
Let’s set another breakpoint at FC00FE by double clicking the corresponding line in the program window:
By pressing the Run button the CPU starts and stops at FC00FE.
Pretty nice so far 🥳, but at this point the Kickstart Rom writes into the CIA registers 🙁. Two CIAs are already present in the current implementation, but they are not yet connected to memory. Therefore I have to stop here. I'll continue this thread once the CIAs are connected. Stay tuned ...