NVSL / linux-nova

NOVA is a log-structured file system designed for byte-addressable non-volatile memories, developed at the University of California, San Diego.
http://nvsl.ucsd.edu/index.php?path=projects/nova
Other
421 stars 117 forks source link

Does multi-sockets decrease the filebench performance of NOVA? #56

Closed ayugeak closed 2 years ago

ayugeak commented 6 years ago

I run this kernel in a quad-socket environment, the NUMA is turned off. I run filebench(varmail,fileserver) in NOVA without latency or bandwidth emulation. However, The results show that NOVA performs much better when filebench is bind to any one of the sockets (which seems like multi-socket decrease the performance), even the filebench threads are much larger than the available threads of all sockets

I also tried different versions of filebench(1.4.9.1 and 1.5-alpha3) and different kernel(Linux-4.3.6, Linux4.4), the problem still exists. I have no idea whether the problem comes from filebench or kernel or NOVA. Have you ever met this problem?

Andiry commented 6 years ago

Hi, thank you for reporting the issue.

  1. Can you try the latest NOVA (this repo, based on 4.13) with filebench 1.5?

  2. If the issue still occurs, can you provide the detail commands you use so that we can reproduce?

ayugeak commented 6 years ago

I did tried the latest NOVA based on 4.13. Platform configurations: 32GB DRAM, 96GB mapped as NVM, quad sockets, E7-4809 v3 2.0GHz (totally 32cores 64threads), Linux 4.13, filebench-1.5-alpha3 Some filebench results are presented here:

filebench-github

(NUMA is truly off because filebench performance is the same when being bind to socket1/socket2/socket3/socket4) The result shows bind-1socket > bind-2sockets > no bind, especially when threads >= 64

Besides, In Linux4.3.6 + NOVA, the performance difference between bind-1socket and no-bind appears since threads >= 2

I found that EXT4-DAX also has the problem, but NOVA is more obvious since its higher performance So the problem is caused by filebench and kernel?

Andiry commented 6 years ago

I think the issue is related with how your emulated NVM is distributed. If it is all resided in NUMA node 1, then threads from other sockets are doing remote access, and the performance is impacted.

ayugeak commented 6 years ago

Well, the NUMA impact did bothered me for a long time... The default configuration has 4 NUMA nodes, including 4 sockets and 4 DIMMs. When I boot the kernel with "NUMA=off", the operating system only shows 1 node. However, when I tried different single-thread micro-benchmarks (IOzone and file-creation/deletion) with different combinations of socket and memory location, the NUMA impact still exists.

But, when I change the BIOS's "memory interleaving" setting from "2-way Node Interleave" to "8-way Interleaving, inter socket", the NUMA impact seems to be gone completely because the performance stays consistent with previous micro-benchmarks.

In addition, in my previous question, the distribution of emulated NVM is as follows (32GB DRAM, 96GB NVM) filebench-github1

The filebench (and other benchmarks) performance stays the same no matter which socket it is bind. So, I'm (almost) pretty sure that NUMA is turned off now...

Actually, it seems that only filebench-multithreads has the socket-bind problem. The IOzone-multithreads does not have the problem.

I noticed the PMEP also has 2 NUMA nodes, and can I ask how you turn off the NUMA?

Andiry commented 6 years ago

I did not turn off NUMA; I access PMEP remotely and it is difficult to modify the BIOS settings. The emulated NVMM only exist on NUMA node 2; But I also leave some DRAM on NUMA node 2, since NOVA uses DRAM for indexing.

In my test I did not see the impact of NUMA. However, there is a paper describes NUMA effects on NVMM file systems. You can check if that fits your problem. https://dl.acm.org/citation.cfm?id=2967379