Closed zachary-cauchi closed 4 months ago
Took a closer look at the registers, and if I'm understanding them right, they don't have a specific address that I can use in the SVD file. Is it the case, or are they just from base 0x00?
I was just about to comment as such; you need to use special MIPS instructions (in this case, mfc0
and mtc0
) to read registers from the coprocessors. Take a look at prussia_rt
, that has...something useful here at least.
I see, so we can't declare these in our SVD. Also checked online and if I understand right, the CoP0 is something quite standard to most/all MIPS devices. All the functionality governing access of the devices is in the routines.S
file, which uses the aforementioned instructions to get the values, and are declared after the jump instructions since the jump instructions always take 2 cycles to take effect?
Following this, what would be the best course in your opinion? Should I modify the routines.S
file to include read/write operations for all the exception-related registers?
I see, so we can't declare these in our SVD. Also checked online and if I understand right, the CoP0 is something quite standard to most/all MIPS devices.
Eeeeh, every MIPS processor must have a Cop0, but the exact format is unspecified (until the MIPS{32,64}rN standards, anyway). To be pedantic, this is a MIPS R4000-style Cop0; the IOP - aka the PS1 CPU - is based on the MIPS R3000, which has its own style of Cop0 registers.
All the functionality governing access of the devices is in the
routines.S
file, which uses the aforementioned instructions to get the values, and are declared after the jump instructions since the jump instructions always take 2 cycles to take effect?
Yes, this is the famous MIPS branch delay slot. That being said, I'm wondering if it's actually correct w.r.t pipeline hazards; something very dumb like sync; m[tf]c0; sync; jr $ra
would always be correct, but it would also be somewhat slow. I don't know if that matters; maybe we should consult what ps2sdk does here.
Following this, what would be the best course in your opinion? Should I modify the
routines.S
file to include read/write operations for all the exception-related registers?
I think that would be the best idea, yes. I think the "level 2 exceptions" (nonstandard MIPS >.>) are going to be a little painful, because Reset
and NMI
are in practice "please reboot the console", while the performance counters are low priority; but having instruction breakpoint capability is high priority for a nice debugging experience >.>
(oh, and while you're there, s/PruSSia/Prussia
; I thought highlighting the pun in the name would be funny, but instead it just reads like I'm referencing the SS. not my smartest idea.)
Non-standard goodness, what more could we want? Okay, I'll begin work on that either tonight or tomorrow afternoon and keep you posted on any hiccups/questions. This would be my first time writing proper assembly (besides in TIS-100 but I'd say that doesn't count) so I will ask you for the occasional review if that's alright. I'll begin with the must-haves and see if I can work my way to the nice-to-haves.
(oh, and while you're there,
s/PruSSia/Prussia
; I thought highlighting the pun in the name would be funny, but instead it just reads like I'm referencing the SS. not my smartest idea.)
Haha, think of it as personality for the project I suppose.
Could you help me out with a problem I'm encountering with reading the cop0 registers using the method in routine.S
. I'm basing my implementation on the read/write status functions. Managed to get it building and running in-pcsx2. However, the values showing up in EEOut are 0. I've updated the hello-rs
script to showcase a working function (using inline assembly) and the non-working function (using routines.S
). Would you happen to have an idea why it's reading 0's? If there's any missing information I can provide, please let me know.
Edit: Surrounding the load instruction sync
instructions fixed the problem, so I'm guessing it's something to do with the jump happening before the mfc0
load is finished?
Edit 2: Removed the sync
instructions and reordered the instructions so mfc0
is run before jr
and that produced the same working result.
Edit 2: Removed the sync instructions and reordered the instructions so mfc0 is run before jr and that produced the same working result.
Oh, I think I know what the problem is. Try adding .set noreorder
at the top of the source file (with the instructions in jr
/mfc0
order). MIPS assemblers try to help the programmer by hiding delay slots, which I think results in the assembler turning this code into jr
/nop
/mfc0
, except the mfc0
never executes because of the return.
Just tested it on the Count
register, looks like it worked and is producing values in the expected pattern! Going to keep the directive at line 1 as suggested and reorder the instructions in all the methods. Thanks a lot for the help! Interesting functionality, though sounds like it would cause more problems than it's worth. Are there any practical use-cases for that feature?
Are there any practical use-cases for that feature?
So that compilers can blithely ignore the branch delay slot as somebody else's problem. >.>
Finalised the last of the exception-related registers and opened #19. When you can, would you please give it a review?
this reminds me about the fun that is the MIPS TLB; we're going to have to decide what to do there at some point.
I'm afraid I'm not familiar with it. From what I tried learning from the docs, it looks to be a cache the MMU uses when translating virtual addresses to physical addresses (or vice versa). Would you please elaborate a little on it?
Pretty much all processors have TLBs to convert addresses, and of course they can't be of infinite size, so when you access a memory region which doesn't have an entry, you need to fetch it.
On x86 and ARM, the CPU will fetch the TLB entry for you, requiring you to structure your page tables in a specific way or the CPU fetches garbage.
On early MIPS, they didn't want to implement the hardware for that, so instead the CPU raises a TLB Refill exception, and expects software to insert the relevant entry, either overwriting a specific entry (tlbwi
) or a random one (tlbwr
). You'll note that TLB Refill has its own dedicated exception point; the idea is that you can just about fit a refill routine into the 0x80 bytes available; additionally, the entries are structured to make it about as easy as possible to do so.
Remember how MIPS has memory segments? Those segments tell the CPU whether to use the TLB or directly map virtual to physical address; useg
(bit 31 clear) uses the TLB, kseg0
/kseg1
(bit 31 set, bit 30 clear) is directly mapped to the first 512MB of address space, but kseg2
/kseg3
(bit 31 set, bit 30 set) uses the TLB. (These segments also control caching; kseg1
is never cached, but the others are.) This is why I access the hardware registers offset by 0xA000_0000
, which places the address in kseg1
, which guarantees access regardless of TLB state.
There is just one problem with the TLBs: emulators hate it when you use them, because you're shuffling memory around underneath their feet, and they have to invalidate internal caching. (Or otherwise assume that your game never touches the TLB and behave incorrectly)
I see, thanks for the crash course. So the decision you mentioned earlier regarding what to do with them would be whether to add support for it, add support at a much later stage, or ignore it outright? What options are there and their pros?
I see two big advantages of having the infrastructure for proper virtual memory:
malloc
must fail, and since Rust APIs generally panic on that condition, the code grinds to a halt. With the TLB, we can take fragmented pieces of RAM and stitch them together into a contiguous block of memory, which lets malloc
succeed here. Sure, there are ways of managing memory which do not use a heap, but writing in that style of code raises the barrier of entry to using Prussia. Personally, I can see memory fragmentation being quite annoying to people, because after a while of running, eventually they hit excessive fragmentation and get an out-of-memory error. You can imagine fun worst-case scenarios like every other 4K block of memory is in use, and allocating 8K fails with 16M of free RAM.
That does sound very compelling. So TLB support should be our target then. Shall I create an issue to handle creating basic access routines for it?
On the one hand, I would highly appreciate that, but the MVP is drawing a triangle, and we shouldn't get too distracted from that. Let's file an issue referencing this discussion so we don't forget about it, but not focus right this minute.
Fair enough. I'll create the associated ticket after I finish work, reference the points here, and write up a DoD for it.
Besides that, I guess the next step would be to figure out how to handle the exceptions reported by the now-available registers? Would they be able to work alongside a custom panic handler?
That seems like a reasonable idea to continue with; at the very least dump Cop0.Cause somewhere.
Great. I'll create another ticket for that and link it in the MVP issue. I'll aim for a custom panic handler that dumps the registers to EEOut
.
To begin work on exception-handling, there would first need to be support for the EE exception registers. These are listed in Chapter 3.2 of the EE Core User's Manual.
Definition of Done:
prussia_rt/src/routines.S
.