HewlettPackard / quartz

Quartz: A DRAM-based performance emulator for NVM
https://github.com/HewlettPackard/quartz
Other
158 stars 66 forks source link

Some issues in pure PM mode... #13

Open wenwen412 opened 7 years ago

wenwen412 commented 7 years ago

I set Quartz to pure PM mode by setting _physicalnodes = "0" in numemul.ini, and set read/write latency both to 1000. Then I start running a program by using runenv.sh the runtime of a test program, which has more than 100000 malloc() called inside, the runtime is about 0.13 seconds. If I run it without using runenv.sh , the runtime is about 0.12s. If I increase the read/write latencies to 10000, then running by runenv.sh, the runtime is about 0.22s.

However, if I replace malloc()/free() with pmalloc()/pfree() in the program, then the runtime is about 2.2s. Which means in a pure PM mode, pmalloc() and malloc() have obvious performance gap. But based on my understanding from the README file, pmalloc() and malloc() should have similar performance under a pure PM environment. Am I missing something?

guimagalhaes commented 7 years ago

Hi. Please describe what else the program does. The PM references is really the important events for the latency emulator. I am not aware of any performance problems of the numa_alloc* API, but these results are important to understand.

wenwen412 commented 7 years ago

The program is a tree structure implementation. It inserts more than 100000 records into the tree and destroys it afterward. It allocates each node by pmalloc(). I enable/disable pmalloc()/pfree() by comment/uncomment the code below:

void* pmalloc (size_t size){
    return malloc(size);
}
void pfree (void *ptr, size_t size){
    return free(ptr);
}

In all cases, i set physical_nodes = "0"; case 1:
read = 10000; write = 10000; pmalloc()/pfree() is enabled runtime is 15.5 s

case 2:
read = 10000; write = 10000; pmalloc()/pfree() is disabled runtime is 0.2 s

case 3:
read = 1000; write = 1000; pmalloc()/pfree() is enabled runtime is 2.2 s

case 4:
read = 1000; write = 1000; pmalloc()/pfree() is disabled runtime is 0.1 s

I almost doubt that the pure PM mode is somehow not working without explicitpmalloc()/pfree(). But case 2&4 show a performance difference.

wenwen412 commented 7 years ago

And below is the cpu information:

wpan@camvis02:~/quartz$ lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                32
On-line CPU(s) list:   0-31
Thread(s) per core:    2
Core(s) per socket:    8
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 63
Model name:            Intel(R) Xeon(R) CPU E5-2640 v3 @ 2.60GHz
Stepping:              2
CPU MHz:               1332.296
CPU max MHz:           3400.0000
CPU min MHz:           1200.0000
BogoMIPS:              5201.25
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              20480K
NUMA node0 CPU(s):     0-7,16-23
NUMA node1 CPU(s):     8-15,24-31
flairtone commented 7 years ago

Hi, Please don't mind for asking this basic question.. How do you run c programs with pmalloc and pfree? I am trying to run the program called "test_nvm_remote_dram.c" present in "test" directory. The following is the command which I used. ../scripts/runenv.sh cc test_nvm_remote_dram.c

Initially I got this error: fatal error: pmalloc.h: No such file or directory compilation terminated.

Later, I set the path to pmalloc.h in the code: #include "/home/dilip/cquartz/src/lib/pmalloc.h"

Now, I am getting this error:

/tmp/cctPQKFG.o: In function iter': mynvmtest1.c:(.text+0xf): undefined reference topmalloc' mynvmtest1.c:(.text+0x3e): undefined reference to `pmalloc' collect2: error: ld returned 1 exit status

guimagalhaes commented 7 years ago

Hi, Please read the build section in the README file. Once you have the library and test programs compiled, you can find the test program binaries in the ./build/test folder, including test_nvm_remote_dram.

guimagalhaes commented 7 years ago

wenwen412, The pmalloc() can be 10x slower than malloc() if the allocation size is small. Please, measure the time (in nanoseconds) per allocation time on malloc() and pmalloc() and compare. Take the memory allocation time on both cases and compare against the overall application time. Let's see if the allocation time is the answer.

hvolos commented 7 years ago

pmalloc/pfree was a stop gap solution for providing a method to allocate regions of emulated NVM and does not work well with existing frameworks such as pmem.io that provide their own memory allocator.

To address this limitation, I've been recently working on an alternative scheme. The alternative scheme no longer provides a pmalloc/pfree interface, but instead, exposes emulated NVM as a tmpfs file system. It expects that one will layer a memory allocator on top of the tmpfs filesystem, as done today with the pmem.io framework that is layered on top of DAX file systems. The current scheme is experimental, so it is available under a separate branch: fam.

There is not yet any documentation on how to use it, but I am happy to provide instructions if anyone is interested in giving it a try.

wenwen412 commented 7 years ago

Thank you @guimagalhaes, I believe the performance difference is caused by too many allocations for small-sized nodes.

Hi @hvolos, I am willing to try the new feature. I would appreciate if you can give me instructions for it.

flairtone commented 7 years ago

Hi @wenwen412 and @guimagalhaes, Can you please share the command which you used to run your application(in C) with pmalloc and pfree? What steps should I follow in order to execute programs containing pmalloc and pfree?

guimagalhaes commented 7 years ago

@D-Chief , You need to include "pmalloc.h" from the library source code and link with the emulator library. See the README files for details on how to build the library and the test tools.

wenwen412 commented 7 years ago

@D-Chief I put the source code test.c in Quartz\test, and modify CMakeLists.txt in Quartz\test, then do a make command to compile. Then simply run xx@xxxxx:~/quartz/test$ ../scripts/runenv.sh ./test

hvolos commented 7 years ago

Here are some rough guidelines on how to use the feature supported by the 'fam' branch. Please note it is still experimental and under active development.

  1. Use the (new) Quartz utility to find out the physical topology (i.e, numa latencies, buses, throttling values) of the machine and save it in an XML file so that it can be reused by emulation runs.

$ ./src/util/quartz/quartz discover

2.i. Edit the variable general:physical_topology in nvmemul.ini to point to the above xml file.

2.ii. Edit the nvmemul.ini to configure the virtual topology to be emulated by Quartz. The default topology configures one compute node equipped with local DRAM and one NVM node backed by tmpfs. The compute node binds to physical socket 0 and the NVM node binds to physical socket 1.

  1. Use the quartz utility to create the virtual topology to be used by emulation runs.

$ ./src/util/quartz/quartz create

After this, you should have a number of tmpfs file systems created based on the virtual topology described in the nvmemul.ini file. You can check this using 'df'

  1. We still rely on the LD_PRELOAD method to attach Quartz and emulate NVM performance, but now programs don't use pmalloc/pfree to allocate emulated NVM. Instead they have to allocate emulated NVM by creating a file in the NVM tmpfs and layer an allocator like pmem.io on top of that file (caution: I haven't yet tried layering pmem.io, although I expect it to work)
flairtone commented 7 years ago

Thanks a lot @wenwen412 @guimagalhaes.

The following are the contents of CMakeLists.txt in Quartz/test directory, I added my file nvpmt1.c to test directory :


include_directories(${CMAKE_SOURCE_DIR}/third_party/gtest-1.7.0/include)

include_directories(/home/dilip/cquartz/src/lib)

add_definitions(-g) add_definitions(-Wall)

add_definitions(-DNDEBUG)

add_executable(test_interpose ${CMAKE_CURRENT_SOURCE_DIR}/test_interpose.cc) target_link_libraries(test_interpose pthread gtest)

add_executable(test_dev ${CMAKE_CURRENT_SOURCE_DIR}/test_dev.cc) target_link_libraries(test_dev pthread nvmemul)

add_executable(test_thread ${CMAKE_CURRENT_SOURCE_DIR}/test_thread.cc) target_link_libraries(test_thread nvmemul pthread)

add_executable(test_mutex ${CMAKE_CURRENT_SOURCE_DIR}/test_mutex.cc) target_link_libraries(test_mutex nvmemul pthread)

add_executable(test_nvm_remote_dram ${CMAKE_CURRENT_SOURCE_DIR}/test_nvm_remote_dram.c) target_link_libraries(test_nvm_remote_dram nvmemul)

add_executable(test_nvm ${CMAKE_CURRENT_SOURCE_DIR}/test_nvm.c) target_link_libraries(test_nvm nvmemul)

add_executable(test_nvm ${CMAKE_CURRENT_SOURCE_DIR}/nvpmt1.c) target_link_libraries(nvpmt1 nvmemul)

add_executable(test_multithread ${CMAKE_CURRENT_SOURCE_DIR}/test_multithread.c)

target_link_libraries(test_multithread rt)

target_link_libraries(test_multithread nvmemul pthread)

add_test(NAME interpose COMMAND ${CMAKE_CURRENT_BINARY_DIR}/test_interpose)

set(ENV_COMMON "LD_PRELOAD=${CMAKE_BINARY_DIR}/src/emul/libnvmemul.so")

SET_PROPERTY(TEST interpose PROPERTY ENVIRONMENT ${ENV_COMMON} "ENUM_INI=emul.ini")


I am using the following command to compile with make(No idea whether it is the right command). Is this how I am supposed to compile? make -f CMakeLists.txt This is the error I get; CMakeLists.txt:2: *** missing separator. Stop.

@wenwen412, could you please share a copy of your modified CMakeLists.txt file here. Sorry for the trouble.

@guimagalhaes building test directory is disabled by default(commented out in CMakeLists.txt at Quartz root directory).

guimagalhaes commented 7 years ago

@D-Chief , In the root of the source tree, you can see the README.md file. It explains the basic instructions on how to build and run the emulator. So let me copy for you the basic commands to build:

Considering you are in the root folder of the source tree:

mkdir build

cd build

cmake ..

make clean all

The binaries will be inside this new build folder.

Once the library is compiled, you can also build a test tool (your pmalloc testing) with this simpler way:

gcc -I/src/lib -L/build/src/lib -lnvmemul

where is the full path of your NVM_EMUL source tree.

Run the test tool you just built:

/scripts/runenv.sh

Also see the /script/install.sh script to install the dependencies. Note that 'build_essentials' (gcc, g++, make) are also required.

flairtone commented 7 years ago

Thanks a TON!! @guimagalhaes This solution works like a charm.