AIFM-sys / AIFM

AIFM: High-Performance, Application-Integrated Far Memory
MIT License
104 stars 34 forks source link

A bug in gc_cache() and CircularBuffer #20

Open PanJason opened 1 year ago

PanJason commented 1 year ago

Hello, I have been developing my application on AIFM for some time. Recently I encountered a bug with gc_cache() and CircularBuffer().

Describe the bug

In gc_cache() a GC worker first checks is_free_cache_high(), which is the size of free_regions_ over the capacity of free_regions, then tries to pick up regions from used_regions_. There is a case where the is_free_cache_high() is not satisfied by "default".

This is because the free_regions_ is constructed with free_regions_count as the argument.

free_regions_ = std::move(CircularBuffer<Region, false>(free_regions_count));

The free_regions_count includes the number of real free regions that can be allocated and the number of per-core pre-allocated free regions. For example, if the kNumSocket1CPUs is set to be 30 and the free_regions_count is 76 (16 + 30 * 2), then only 16 free regions will be in the free_regions_ circular buffer and the is_free_cache_high() is not satisfied by default.

So if GC is triggered, the GC will run into an indefinite loop because the is_free_cache_high() will never be satisfied.

Priority

I think this can be fixed. I can change this line which initialized the free_regions_ from using free_regions_count as the size (capacity) argument to using free_regions_coun - 2 * helpers::kNumSocket1CPUs.

  free_regions_ = std::move(CircularBuffer<Region, false>(free_regions_count - 2* helpers::kNumSocket1CPUs));

I can create a PR for this if other things are fine after the change because there may be some places where I miss.

Screenshots

image The gc_cache() runs into an indefinite loop.

server

I think the environment does not matter much in this case. g++ version: g++ 9.2.0 kernel version: 5.14.0