Kobzol / hardware-effects

Demonstration of various hardware effects.
MIT License
2.82k stars 157 forks source link

cache-aliasing example should be "cache-thrashing" #14

Closed NanXiao closed 5 years ago

NanXiao commented 5 years ago

Hi Jakub,

Greeting from me!

I think cache-aliasing example actually does not describe what "cache-aliasing", but "cache-thrashing". According to Wikipedia:

Aliasing: Multiple virtual addresses can map to a single physical address. Most processors guarantee that all updates to that single physical address will happen in program order. To deliver on that guarantee, the processor must ensure that only one copy of a physical address resides in the cache at any given time.

The point of "cache-aliasing" is at the same time, there are more than one copies of physical address resides in the cache. What your program demonstrates is because the conflicts in cache set, the cache evicts a cache line frequently, and this is a typical "cache-thrashing".

Hope you check it, thanks!

Best Regards Nan Xiao

Kobzol commented 5 years ago

Hi, you're right that aliasing may be a bit misleading in this case, although the problem is known as 4K aliasing (this name is also used by VTune), because the addresses of different memory locations alias to the same cache hash slot.

Cache-thrashing seems ok, I would also consider cache conflicts (since the thrashing is caused by conflicts in the cache "hash table"). What do you think?

NanXiao commented 5 years ago

@Kobzol Yes, I think "cache conflicts" is OK, thanks!

BTW, why not just use a vector instead of using memory pointer to dereference, like this:

#include <iostream>
#include <vector>
#include <chrono>
#include <memory>

#define REPETITIONS 10000000

void test_memory(std::vector<int>& memory, size_t increment)
{
    using Clock = std::chrono::steady_clock;
    auto start = Clock::now();

    for (int i = 0; i < REPETITIONS; i++)
    {
        for (size_t j = 0; j < memory.size(); j += increment)
        {
            memory[j] += j;
        }
    }

    std::cerr << std::chrono::duration_cast<std::chrono::microseconds>(Clock::now() - start).count() << std::endl;
}

int main(int argc, char** argv)
{
    if (argc < 3)
    {
        std::cout << "Usage: cache-aliasing <count> <increment>" << std::endl;
        return 1;
    }

    size_t count = static_cast<size_t>(std::stoi(argv[1]));
    size_t increment = static_cast<size_t>(std::stoi(argv[2]));

    std::vector<int> memory(count * increment);
    test_memory(memory, increment);

    return 0;
}

Is there any special concern? Thanks very much in advance!

Kobzol commented 5 years ago

Well the input increment is in bytes, not in multiples of ints (even though I allocate it as such). It's not really necessary but I think it's more natural to calculate the cache addresses in bytes. I changed it to auto increment = static_cast<size_t>(std::stoi(argv[2])) / sizeof(int);. and renamed the example to cache-conflicts.

I put it here: https://github.com/Kobzol/hardware-effects/commit/4ceee1cdd808e75efd41943629f657a2ebbf46a7, if you have any comments, please let me know :)

NanXiao commented 5 years ago

Your code is better! No any comment, you can close this issue now, thanks very much!

Kobzol commented 5 years ago

Thanks, I merged it.