HewlettPackard / quartz

Quartz: A DRAM-based performance emulator for NVM
https://github.com/HewlettPackard/quartz
Other
158 stars 66 forks source link

How to run with numactl? #9

Open Gumi-presentation-by-Dzh opened 7 years ago

Gumi-presentation-by-Dzh commented 7 years ago

When quartz come to DRAM+NVM mode , it simulate the nvm on one (remote) node and inject the latency (maybe read latency?).

So can I think that the access memory behavior in remote node's dram is NVM access behavior?

If it is , can I use numactl mbind on node to run the app in nvm? What should I change in nvmemul.ini?

Gumi-presentation-by-Dzh commented 7 years ago

Besides , When quartz in NVM mode , the emulator will use a CPU socket with no sibling node and make use of the DRAM available in that socket to emulate NVM. Any DRAM memory access on this socket will produce delays injection to emulate the target latency. So we do not need to think how to run this procedure in numa architecture, because the all dram is simulated as nvm.

In this way , What do we need to do with nvmemul.ini? And how to run our app ? Just use scripts/runenv.sh ?

Gumi-presentation-by-Dzh commented 7 years ago

If we do not apply runenv.sh, and use other lib libraries such as (pmem.io or our own) or libnuma to simulate heterogeneous memory or nvm, how can we use quartz?

Manually set the LD_PRELOAD and NVMEMUL_INI environment variables instead of using runenv.sh? Can you tell us in detail?

Gumi-presentation-by-Dzh commented 7 years ago

I use the scripts/runenv.sh numactl --physcpubind=0 --membind=1 runspec --config=Example-linux64-amd64-gcc43.cfg --noreportable --iteration=1 403 to try run my app in libnuma on quartz.

but I found the error ERROR: ld.so: object 'scripts/../build/src/lib/libnvmemul.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored. What is that?

guimagalhaes commented 7 years ago

Make sure you have 2 CPUs on different sockets (IvyBridge or newer). Make sure nvmemul.ini points these 2 numa nodes in the 'physical nodes'. Memory accesses to the remote node are then considered NVM and the emulator will inject delays for these accesses to emulate the target latency. You can use numactl, maybe the bese way is "runenv numactl ". So the emulatorlibrary is first initialized, which makes some numa bindings and then numactl is executed, which will overwrite possibly some of the numa bindings the emulator did. I remember trying this and applications should be correctly emulated this way (instead of calling pmalloc() explicitly).

guimagalhaes commented 7 years ago

To run in NVM only mode, change nvmemul.ini to know just one numa node in the physical nodes section.

guimagalhaes commented 7 years ago

Check what the runenv.sh script does in details, all those commands are required to properly configure the environment for the emulator. You can run those commands on a new shell, just keep in mind that any command you run in a shell with LD_PREPLOAD set will be emulated.

guimagalhaes commented 7 years ago

Use simple commands first to make sure you can run applications with just the emulator, with numactl and then with both the emulator and numactl.

Gumi-presentation-by-Dzh commented 7 years ago

I try to understand what you say "To run in NVM only mode, change nvmemul.ini to know just one numa node in the physical nodes section." and "Make sure nvmemul.ini points these 2 numa nodes in the 'physical nodes'."

It means that the nvmemul.ini : topology : physical_nodes should be set as "0,1" . That will points 2 numa nodes in the "physical nodes". But What is the "change nvmemul.ini to know just one numa node in the physical nodes section." means? Is that numactl --physcpubind ?

guimagalhaes commented 7 years ago

If you set: physical_nodes="0,1" Then the emulator will make use of NUMA node 0 as the DRAM memory and node 1 as the NVM memory. The application threads will e bound to NUMA node 0, the application/process data and heap memory will also be allocated on node 0. The application may make use of the NVM API (pmalloc/pfree) to allocate memory from NVM memory (NUMA node 1).

If you set: physical_nodes="0" The emulator will know just this NUMA node, even if you use pmalloc, the allocated memory will be from NUMA node 0. The application threads will be bound to this NUMA node as you would expect. Every memory access to this node 0 will be taken into account when delays are inject. This is the NVM only mode.

guimagalhaes commented 7 years ago

Please see more details on the README file, let me know what is not clear yet.

Gumi-presentation-by-Dzh commented 7 years ago

Thank you for the detailed explanation, I have to understand how to use it. I did not find this part in README so I'm a bit confused.

Let's talk about the DRAM+NVM model, "The application threads will e bound to NUMA node 0, the application/process data and heap memory will also be allocated on node 0." Is this already done by runenv.sh ? So can I understand that it is similar with "numactl --physcpubind=0 --membind=1 " ?

If I don't use the lib "NVM API" in physical_nodes="0,1" DRAM+NVM model and run my application , what will happen? Does my APP still use NVM just without your Memory management strategy "pmalloc/pfree" ? Do I have to use DRAM + NVM model and "pmalloc / pfree" together?

guimagalhaes commented 7 years ago

runenv.sh sets the LD_PRELOAD environment variable, which makes the emulator library to be automatically loaded with the application. The emulator library will track all application threads and bind them to the proper NUMA node. It is not the same as "numactl --physcpubind=0 --membind=1", since the emulator will bind processor and heap memory to the same NUMA node, and pseudo NVM allocations (pmalloc) to the remote NUMA node (when in DRAM+NVM mode). The numactl command above will bind every memory allocation to the 'remote' NUMA node. When emulating DRAM+NVM mode (runenv.sh or LD_PRELOAD directly and configuring nvmemul.ini as discussed), you also need to define a strategy for what data your application should allocate as persistent memory (typically higher latency than DRAM) and run experiments to see the performance impact. For this type of application data, you need to change the application to allocate it using pmalloc/pfree (which will automatically bind the memory to the 'remote' NUMA node). Another experiment you can do is to use numactl to force all memory allocation to the 'remote' NUMA node, but still you need to preload the emulator library so it will inject delays for the memory accesses to the emulated NVM memory. When I say 'remote' NUMA node, typically the library is configure with node 0 as local memory (used as DRAM node) and node 1 as its remote sibling (used as NVM node), so, you can user numactl to bind memory to 'membind=1' which will be seen as NVM memory by the emulator. So, first understand the test strategy you want, which data should be allocated in emulated NVM and consider changing the application to call pmalloc/pfree for that particular data and link the emulator library (as the first library) when compiling the application (and without numactl on this case). Is a better approach if simple to change the application code.

Gumi-presentation-by-Dzh commented 7 years ago

Thank you so much, but I still have the last question.

You mentioned "the emulator will bind processor and heap memory to the same NUMA node, and pseudo NVM allocations (pmalloc) to the remote NUMA node (when in DRAM + NVM mode)."Can I understand that your implementation is similar to Glibc? Does "pmalloc" require specific kernel support? Can I think the emulator still allocates part of the memory (as you mentioned "heap") in the same NUMA node (local node also the Dram).

guimagalhaes commented 7 years ago

pmalloc() is part of the emulator API, if you see the implementation (pmalloc.c), it is a call to numa_alloc_onnode(), which is part of libnuma. Then, when the application calls pmalloc(), the emulator will take care of allocating memory from the NUMA node which is the emulated NVM memory. There is no other requirement. If the application calls standard malloc (libc), the memory is allocated by OS, but again the emulator will bind this allocation to the DRAM node.

Gumi-presentation-by-Dzh commented 7 years ago

Gotcha, Thank you for teaching me that.

flairtone commented 7 years ago

Hi, $ scripts/runenv.sh numactl --membind=0 [app] If we run the application as indicated above, shouldn't the emulator only make memory accesses to local DRAM or will it be considered as NVM only mode?

I have this question because, when I run the code(in C) which loops for 10000 times preforming malloc and free operations, the statistics report NVM accesses being made irrespective of what membind is set to(0 or 1). But, lower number of NVM accesses are made when membind=0.

guimagalhaes commented 7 years ago

@D-Chief , The NVM only, or DRAM+NVM mode is selected in the configuration file, regardless you use numactl or not. Actually, numactl can be used is some specific cases, but usually you don't need it. Select the desired mode by indicating the NUMA nodes in the nvmemul.ini file in the "physical_nodes" entry. If you have at least two CPU sockets (2 NUMA nodes), you will be able to run in DRAM+NVM mode. See the README.md file for details.

flairtone commented 7 years ago

Got it! Thank you @guimagalhaes ,

Gumi-presentation-by-Dzh commented 6 years ago

Hi, $ scripts/runenv.sh numactl --membind=0 [app] $ scripts/runenv.sh numactl --membind=1 [app]

I use the DRAM+NVM mode to run this two command, but I found the runtime is same. In numactl, I will run my app using DRAM as you can think the node0 in first command, so the runtime will lower than second one, but they are same. why does this happen?

guimagalhaes commented 6 years ago

The emulator is binding the application memory regardless you use numactl or not. So, probably the numactl decision is overwritten by the emulator library. Either use malloc() or pmalloc() in your application to correctly allocate memory in the expected virtual node (DRAM or NVM respectivelly). pmalloc() is provided by the emulator library, so make sure you link the library with your application.