Closed ogamespec closed 5 months ago
@ogamespec I don't think there is an error in the ALU logic, at least not as far as I have checked. The condition is checked in az[11]
, and then passed to azo[11]
and then to ALU_Out1
. When ALU_Out1
is low, the branch is taken. CLK6
gates the connection between az[11]
and azo[11]
.
The problem appears to be in the slight delays in the phases of the CLK*
s in the dmg-sim
simulator. The CLK6
goes low 28ns before CLK2
positive edge, which I believe triggers the state change in the Sequencer
. So the Sequencer always sees ALU_Out1
as low, and always takes the branch.
Running dmg-sim
with the delays zeroed out makes it work.
I was currently tracing the clock paths to see the origin of the delay. CLK2
appears to have 12 entity connections more than CLK6
, from their common ancestors, which explains the delay (each entity has 2ns or more).
I will see if I apply a HACK delay to CLK6
to see if everything works, and then I'll make a PR with the temporary fix.
In the image, you can see ALU_Out1
(due to CLK6
) going low before the state change (due to CLK9
, I think, though it was CLK2
before, so the delay is actually 22ns).
I checked the schematics and topology - everything looks correct. I updated a bit the corresponding section on the wiki (alu.md). As for CLK delay spacing - I think it's some kind of devil :) imho it's easier to stretch the levels wider so that the circuits have time to settle. All the same, we are dealing with hybrid logic (half made by Latch, half by DFF), that is, in the current design I have little idea whether it can be synthesized in hardware (FPGA).
In any case, arranging delays is beyond my expertise (I'm at the developmental level of "connect A and B as a netlist and be happy") 😃
Test ROM & waves:
// Check the part of the circuit that deals with checking the condition code ("cc check").
// For this purpose we will use RET cc instructions as the most convenient for verification;
// Knowingly false cc checks will be performed so as not to interrupt code flow
00 // nop
3e 00 // ld a, 0 <-- a = 0
3c // inc a <-- a = 1 (ZF=0)
c8 // ret z <-- return if ZF == 1
3d // dec a <-- a = 0 (ZF=1)
c0 // ret nz <-- return if ZF == 0
37 // scf <-- CF = 1
d0 // ret nc <-- return if CF == 0
3f // ccf <-- CF = 0
d8 // ret c <-- return if CF == 1
76 // halt
Also in ALU.v I found such a note:
// Dynamic part
// TBD: Check if it is necessary to add transparent DLatch for dynamic logic outputs (on inverter gates) or if this will do.
assign azo[0] = CLK2 ? az[0] : 1'b1;
assign azo[1] = CLK7 ? (CLK6 ? az[1] : 1'b1) : 1'b1; // -> bc5
assign azo[2] = CLK7 ? (CLK6 ? az[2] : 1'b1) : 1'b1; // -> bc1
assign azo[3] = CLK2 ? az[3] : 1'b1;
Technically, if you have dead-time between CLKs in your simulation, DLatch should save the day. I'll look at the topology some more and add it most likely.
Actually, the fix is better applied upstream, delaying the signals CLK3
, CLK4
, CLK5
, and CLK6
by 22 ns. But I will still make a PR here with the debug signals and some comments I added, at least.
The path for the CLK*
signals that I traced are the following (only the critical path, some signals are NANDs with other intermediary clock signals):
- CLK9 <- BOGA <- BALY - CLK1 <- AWOB <- BOGA <- BALY - CLK2 <- BEDO <- BYXO <- BUVU <- BALY <- BYJU <- BELE <- BUTO <- BAZE <- BELO <- BANE <- BEJA <- BOLO <- BUFA <- BERY <- BAPY - CLK3 <- BEKO <- BUDE <- BIRY <- BELU <- ~ATYP & CLK_EAN - CLK4 <- UVYT <- BUDE - CLK5 <- BOLO <- BUFA - CLK6 <- BUFA <- BERU <- BAPY BAPY <- ~(ATYP | AROV | ~CLK_ENA) = ~ATYP & ~AROV & CLK_ENA ATYP <- AFUR AROV <- APUK
@msinger Do you have any idea how accurate the delays of these signals are? Do you think just delaying CLK3
, CLK4
, CLK5
, and CLK6
by 22ns
is a valid fix?
I'll add DLatch tomorrow where they are in the actual chip. It won't be worse :) At the same time we will check whether we need to make delays or will work without them.
@Rodrigodd Please, try #271 + #272 without CLK delays :)
It so happens that asymmetric CLK6/CLK7 are used for flags and cc_check, so if you put DLatch on the output of these random logic trees, it "extends" the result as needed (see picture).
@Rodrigodd, I just pulled those delays out of my butt. I guessed them based on the size of the cells. Once I have created the layouts for all cells, I will change them based on the number of transistors that are in series. The simulation currently doesn't recreate the glitches I see on the ~RD signal of the real device. And there is some volume envelope glitch in the APU that can happen on the real device, which doesn't happen in the simulation. I forgot how exactly this works though. This also indicates that the delays are wrong. I don't know what happens if you just delay all clocks by some fixed value.
I guessed them based on the size of the cells.
Yeah, but I think it is unlikely that 3 cells have a bigger delay than 15 cells, which is what my fix was relying on.
Please, try https://github.com/emu-russia/dmgcpu/pull/271 + https://github.com/emu-russia/dmgcpu/pull/272 without CLK delays
This one is a much more sound solution. Tested it on dmg-sim
without the delays and it is now working:
Thanks again @ogamespec! I believe this issue is now fixed.
Let's move on :)
There are reports that the cc check circuit (RET cc, JP cc, etc.) is not working correctly. It is necessary to make a test rom for testing and visual inspection of signals. At the same time to process the results and supplement the description on the wiki (it is assumed that the cc_check circuit is located in ALU random logic, associated with the signal ALU_Out1, which goes to the sequencer).