I've been notified of some quirks with the SH4 that effect performance. Its recommended that your function's arguments sum up to 32 bits/4 bytes. This can be done by passing structs by reference. Also inside the function they recommend using 10 vars of up to 32 bits / 4 bytes each (Not 320 bits total). If you fit these requirements you're only doing register accesses (5 nanoseconds), but any more and you need to read the stack which is a memory access at 1 millisecond. This is only really important for the rendering side since for 60FPS each frame is 16 milliseconds.
I've been notified of some quirks with the SH4 that effect performance. Its recommended that your function's arguments sum up to 32 bits/4 bytes. This can be done by passing structs by reference. Also inside the function they recommend using 10 vars of up to 32 bits / 4 bytes each (Not 320 bits total). If you fit these requirements you're only doing register accesses (5 nanoseconds), but any more and you need to read the stack which is a memory access at 1 millisecond. This is only really important for the rendering side since for 60FPS each frame is 16 milliseconds.
https://www.st.com/content/ccc/resource/technical/document/reference_manual/37/ad/a9/94/a1/8d/43/a1/CD17839242.pdf/files/CD17839242.pdf/jcr:content/translations/en.CD17839242.pdf
https://cdn.discordapp.com/attachments/488332893324836866/657738059214749710/SH4_ABI.PNG