Kobzol / hardware-effects

Demonstration of various hardware effects.
MIT License
2.82k stars 157 forks source link

4K aliasing correct perf counter #18

Open skgbanga opened 4 years ago

skgbanga commented 4 years ago

I am curious what is the 'correct' perf counter for 4K aliasing. You have mentioned ld_blocks.store_forward, but I was wondering about the other counter ld_blocks_partial.address_alias as well.

Here is the perf list description:

  ld_blocks.store_forward                           
       [loads blocked by overlapping with store buffer that cannot be forwarded]
  ld_blocks_partial.address_alias                   
       [False dependencies in MOB due to partial compare on address]

Here are the perf results on my machine:

$ perf stat -e ld_blocks_partial.address_alias,ld_blocks.store_forward ./a.out 4096
222

 Performance counter stats for './a.out 4096':

             6,852      ld_blocks_partial.address_alias:u                                   
                32      ld_blocks.store_forward:u                                   

       0.224647447 seconds time elapsed
$ perf stat -e ld_blocks_partial.address_alias,ld_blocks.store_forward ./a.out 4092
359

 Performance counter stats for './a.out 4092':

       132,139,399      ld_blocks_partial.address_alias:u                                   
         2,097,093      ld_blocks.store_forward:u                                   

       0.361229917 seconds time elapsed

As you can see, both of them are hugely different for 4092 and 4096.

Kobzol commented 3 years ago

Good point, I remember that I was also thinking about this, but it was some time ago :sweat_smile: From ld_blocks_partial.address and ld_blocks.store_forward, I suspect that maybe store_forward reports the actual cases where forwarding was blocked, and the second counter reports cases where it was due to "false" aliasing (i.e. when forwarding would be possible, but there was an alias). But obviously these two counters are sampled in different situations, because their values are vastly different.

I would have to remind myself of this in more detail, my knowledge is not so deep in this area :) This is just a (probably wrong) guess.

@travisdowns any hints? :)

travisdowns commented 3 years ago

Yes, I think ld_blocks_partial.address_alias counts cases where there was an initial "hit" in the store buffer loose net (i.e., the CPU thinks a load is going to forward from a store), but then when the full address was compared in the fine net, it was found to be a spurious hit due to 4K aliasing. This question and answers have some details about store forwarding and in particular "fine net" and "loose net".

ld_blocks.store_forward measures some other type of block related to store forwarding, although I'm not actually sure what. Maybe when a load is predicted to forward, but the data is not available, or when a load can't be forwarded because it overlaps but is not fully contained within a store (example).