Open skgbanga opened 4 years ago
Good point, I remember that I was also thinking about this, but it was some time ago :sweat_smile: From
ld_blocks_partial.address and ld_blocks.store_forward, I suspect that maybe store_forward
reports the actual cases where forwarding was blocked, and the second counter reports cases where it was due to "false" aliasing (i.e. when forwarding would be possible, but there was an alias). But obviously these two counters are sampled in different situations, because their values are vastly different.
I would have to remind myself of this in more detail, my knowledge is not so deep in this area :) This is just a (probably wrong) guess.
@travisdowns any hints? :)
Yes, I think ld_blocks_partial.address_alias
counts cases where there was an initial "hit" in the store buffer loose net (i.e., the CPU thinks a load is going to forward from a store), but then when the full address was compared in the fine net, it was found to be a spurious hit due to 4K aliasing. This question and answers have some details about store forwarding and in particular "fine net" and "loose net".
ld_blocks.store_forward
measures some other type of block related to store forwarding, although I'm not actually sure what. Maybe when a load is predicted to forward, but the data is not available, or when a load can't be forwarded because it overlaps but is not fully contained within a store (example).
I am curious what is the 'correct' perf counter for 4K aliasing. You have mentioned
ld_blocks.store_forward
, but I was wondering about the other counterld_blocks_partial.address_alias
as well.Here is the
perf list
description:Here are the perf results on my machine:
As you can see, both of them are hugely different for
4092
and4096
.