frno7 / linux

Linux 2.2, 2.6, 3.x, 4.x and 5.x kernels for the PlayStation 2.
Other
84 stars 5 forks source link

Save and restore the R5900 specific register extensions #5

Open frno7 opened 5 years ago

frno7 commented 5 years ago

The R5900 extends the MIPS general purpose registers (GPRs) from 64 bits to 128 bits. The upper 64 bits of the GPRs are used by the R5900 specific quad load/store (LQ/SQ) and multimedia instructions (MMIs).

The special HI1 and LO1 registers, which are the upper 64 bits of each of the 128-bit HI and LO registers, are used by the R5900 specific multiply and divide instructions, such as MULT1, MULTU1, DIV1, DIVU1, MADD1 and MADDU1, as well as MFHI1, MFLO1, MTHI1 and MTLO1.

Finally, a R5900 specific shift amount (SA) register contains the shift amount for the 256-bit funnel shift instruction QFSRV.

To use the instructions mentioned above in user space, these R5900 specific registers must be saved and restored appropriately by the kernel when switching contexts. If the registers are not saved and restored, they ought to be cleared to avoid leaking information between processes.

Will these registers be saved and restored for both the o32 and the n32 Linux ABIs? It would be most useful and least surprising to support both ABIs. Of particular note is that the MMIs cannot be disabled by the kernel. The MMIs would therefore appear to mostly work, even when the kernel actually does not save and restore the registers. Problems would manifest as random register corruptions, that can be very difficult and frustrating to find.

See appendix B in the TX79 manual.

sp193 commented 5 years ago

If I remember correctly, the R5900 support which Jurgen (Mega Man) committed uses n32 ABI. As of today, support for the R5900 within GCC is incomplete. Support for the FPU did not work as of 2015 (configuration problem within libgcc) and GCC does not support any of the non-standard functions (2nd integer pipeline & MMI).

As for dealing with the custom 128-bit registers & MMI: the recommendation I got from the GCC folks and agreed with, was to use the 128-bit MMI as a hardware vector mode implementation. A number of instructions fit nicely with GCC vector support and GCC would recognize that hardware registers can only be used for either integer arithmetic or vector computations. It makes sense anyway, since there are no MMI for doing some basic things like loading 128-bit integer literals, shifting, 256-bit multiplication or division, which GCC requires for treating the GPRs as truly 128-bit.

However, I could not put together a working system.

As for the 2nd integer pipeline: there may be a need to implement some scheduler to determine which integer pipeline to use. I'm not sure why, but the homebrew GCC v3.2.2 port did not have such a mechanism and putting the predicates for the two HILO pairs seems to cause GCC to automatically choose one. It seems to be random, but I haven't seen it make conflicting choices yet. Since there are two pipelines, there is also a need to make GCC track the two pipeline hazards individually. I've got no confidence that I did it right, so GCC still doesn't support the two integer pipelines.

If you don't intend to support any of these, I think it should still work and there should be no security risk either (you generally need to copy values to the upper 64-bits, for there to be some value there)... but you will not be using the full power of the R5900.

frno7 commented 5 years ago

I think assembly code and C intrinsics are likely uses of MMIs and other R5900-specific extensions with Linux. GCC auto-vectorisation would obviously be very nice to have too. GCC does actually generate the R5900-specific three-operand instructions MULT and MULTU, which is why R5900 QEMU supports it in commit 21e8e8b230. See #3 for a separate issue about the R5900 FPU.

mirh commented 5 years ago

@uyjulian @ZirconiumX ¿

uyjulian commented 5 years ago

I've never really worked with PS2 Linux ABI before.
However, it would be nice to have a consistent documented implementation for saving/restoring MMI registers.

frno7 commented 5 years ago

@uyjulian, the 32-bit o32 ABI uses SW/LW instructions to save and restore general purpose registers (GPRs) between context switches. The 64-bit n32 ABI uses SD/LD. For the R5900 Linux kernel, I think we should always use SQ/LQ to handle the full 128-bit multimedia registers (MMRs), which otherwise would be lost. The R5900 specific shift amount (SA) register needs to be handled by the kernel as well.

If done properly, Linux applications need not worry about these details, because the MMRs would retain their values after context switches.

Ravenslofty commented 5 years ago

We could get into all sorts of ABI disputes, but I think N32 is strictly superior to O32.

The FPU state for context switch should also be preserved, including the FPU accumulator, which has no direct instructions for accessing it.

On Sat, 13 Apr 2019, 18:57 frno7, notifications@github.com wrote:

@uyjulian https://github.com/uyjulian, the 32-bit o32 ABI uses SW/LW instructions to save and restore general purpose registers (GPRs) between context switches. The 64-bit n32 ABI uses SD/LD. For the R5900 Linux kernel, I think we should always use SQ/LQ to handle the full 128-bit multimedia registers (MMRs), which otherwise would be lost. The R5900 specific shift amount (SA) register needs to be handled by the kernel as well.

If done properly, Linux applications need not worry about these details, because the MMRs would retain their values after context switches.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/frno7/linux/issues/5#issuecomment-482852870, or mute the thread https://github.com/notifications/unsubscribe-auth/ABbx2-D94J2UocHsxOIzbEtuItDJHSLTks5vghqigaJpZM4bl3tp .

frno7 commented 5 years ago

We could get into all sorts of ABI disputes, but I think N32 is strictly superior to O32.

MIPS Linux kernel maintainers have suggested that the o32 ABI must be merged before proceeding to the n32 ABI, because the latter is substantially more problematic for the R5900. I hope we can support both ABIs, so that they can be chosen freely.

I fixed the R5900 o32 ABI in Glibc 2.29. Older Glibc versions can be used if commit 8e3c00db16fc (MIPS: Use `.set mips2' to emulate LL/SC for the R5900 too) is applied. Glibc needs another patch to support the R5900 n32 ABI as well, but I have not submitted it yet. Both ABIs can the be emulated with an appropriately patched R5900 QEMU, which is useful to compile for example a modern R5900 Gentoo Linux.

The FPU state for context switch should also be preserved, including the FPU accumulator, which has no direct instructions for accessing it.

Do you have links to documents describing the details of this? (There are further notes about the R5900 FPU in #3.)

Ravenslofty commented 5 years ago

On Sat, 13 Apr 2019, 19:35 frno7, notifications@github.com wrote:

We could get into all sorts of ABI disputes, but I think N32 is strictly superior to O32.

MIPS Linux kernel maintainers have suggested that the o32 ABI must be merged before proceeding to the n32 ABI, because the latter is substantially more problematic for the R5900. I hope we can support both ABIs, so that they can be chosen freely.

That sounds reasonable, although we'll need a toolchain before we get anywhere serious with it.

I fixed the R5900 o32 ABI in Glibc 2.29. Older Glibc versions can be used

if commit 8e3c00db16fc https://sourceware.org/git/?p=glibc.git;a=commit;h=8e3c00db16fc (MIPS: Use `.set mips2' to emulate LL/SC for the R5900 too) is applied. Glibc needs another patch to support the R5900 n32 ABI as well, but I have not submitted it yet. Both ABIs can the be emulated with an appropriately patched R5900 QEMU https://github.com/frno7/qemu, which is useful to compile for example a modern R5900 Gentoo Linux.

I suspect we might not get much upstream support from them though.

The FPU state for context switch should also be preserved, including the

FPU accumulator, which has no direct instructions for accessing it.

Do you have links to documents describing the details of this? (There are further notes about the R5900 FPU in #3 https://github.com/frno7/linux/issues/3.)

EE Core User's Manual, Chapter 8.

frno7 commented 5 years ago

That sounds reasonable, although we'll need a toolchain before we get anywhere serious with it.

Modern GAS and GCC are in shape and usable for R5900 o32 and n32, as well as modern R5900 Linux 5.x kernels, and for R5900 Linux distributions such as Gentoo. Toolchain patches were merged upstream last year.

In addition, a new -mfix-r5900 GAS and GCC option is now available to compile generic MIPS Linux programs and libraries that also work with the R5900, which otherwise is incompatible due to the short loop hardware bug described in #8.

I suspect we might not get much upstream support from them though.

A Glibc patch for the R5900 n32 ABI is forthcoming. In the meantime I have a workaround patch. I have been busy with the R5900 QEMU and the R5900 Linux kernel. I have discussed the R5900 n32 ABI with Glibc developers, and I actually believe that they will merge an appropriate patch. It is a bit more work compared with the already supported R5900 o32 ABI, though.

sp193 commented 5 years ago

FPU accumulator, as in $ACC? There exists the sa instruction - Save Accumulator. It exists solely for the preservation of thread contexts.

EDIT: Oh, I think I remember what that was. Maybe you could use an instruction like adda, to get the current value of the FPU accumulator.