Closed pveentjer closed 2 months ago
Not sure what value it provides to the readers. If there is some implicit point that you're trying to make, then I suggest to make it explicit.
Hi Denis,
thank you for the review.
The point I'm trying to make is that in the initial section, it states that modern architectures are load-store. X86 is one of the most used ISAs and isn't a load-store architecture but register-memory. As a consequence, a reader of the book could falsely assume that X86 ISA is a load-store architecture.
If you think this brings no value, I'll close the PR.
If you think this brings value, I'm all ears to rewrite it. Perhaps add it as a footnote?
Regards,
Peter.
Ok, I changed the original paragraph:
register-based, load-store architectures
-> register-memory architectures
Let me know if that's good now.
Hi Denis,
the original text was correct. So modern ISA's like RISC-V and ARM are load-store architectures.
The X86 ISA is register-memory, but after uops conversion, the X86 microarchitecture also has transformed into a load-store architecture.
Ooops, of course, you're right. I was implicitly thinking about x86 again. :) Will fix.
Please check now.
What is missing is that the X86 microarchitecture is a load/store architecture as part of the uops conversion.
Given the following code:
add [C],[A],[B] ;; C=A+B
After uops conversion it could look like this:
load r1, [A] ;; load [A] in r1
load r2, [B] ;; load [B] in r2
add r3, r1, r2 ;; add r1 and r2 and write it to r3
store r3, [C] ;; store r3 in [C]
I find it very helpful because I need to think a lot less about the complex addressing modes and it helps me to understand the performance opportunities. In the first example, it isn't immediately clear that the loads of [A] and [B] can be performed out of order, but in the uops version, it is much more obvious.
It could also help to prevent people to manually 'optimize' code like this (C-example):
register int a=A;
register int b=B;
C=a+b;
This is written by a 'smart' engineer who wants to help the CPU by giving more opportunities for out-of-order execution because he doesn't understand the uops version of add [C],[A],[B]
. But he is just making the code more complex and bigger and the CPU will already do this for him anyway.
Ok, I agree, but I think this section is not the best place to discuss uops. I have a section for this: 4-4 UOP.md
There I talk about uops cracking. Please check it and let me know if you have any comments.
Thank you @pveentjer !
Added note about the duality of load/store and register/memory behavior of the X86.