fengggli / comanche

comanche
Apache License 2.0
0 stars 1 forks source link

mcas memory register in nvmestore #6

Open fengggli opened 5 years ago

fengggli commented 5 years ago
  1. The second mmap has MAP_PRIVATE, so dont memset. otherwise another private copy(some un-aligned physical address will be used)
  2. I verified both virtual addr and phys_addr are 2M aligned, but still failed in regestration.
  3. The previous cuda-dma seems to work, what happened to this hugepage memory?

Previously I had tried to use newer dpdk(https://github.com/fengggli/dpdk/issues/1), but failed

fengggli commented 5 years ago
fengggli commented 5 years ago
(py36) lifen@sievert(:):~/Workspace/comanche$pmap -X 7471
7471:   /home/lifen/Workspace/vagrantvm/vagrant-ubuntu18-spdk1810/comanche/build/src/components/store/nvmestore/testing/mcas-nvmestore/test-mcas-nvmestore --pci 20:00.0
pmap: ERROR: inconsistent detail field in smaps file, line:
 Size:              18856 kB
fengggli commented 5 years ago
fengggli commented 5 years ago

https://github.com/fengggli/spdk/issues/1 This is also how spdk update the dma mapping to kernel through vfio:

157             ret = ioctl(g_vfio.fd, VFIO_IOMMU_MAP_DMA, &dma_map->map);                                             │        linux-vdso.so.1 =>  (0x00007ffd79d12000)
(gdb) bt                                                                                                               │        libcunit.so.1 => /usr/lib/x86_64-linux-gnu/libcunit.so.1 (0x00007efdda798000)
#0  vtophys_iommu_map_dma (size=2097152, iova=30064771072, vaddr=30064771072) at vtophys.c:157                         │        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007efdda57b000)
#1  spdk_vtophys_notify (cb_ctx=<optimized out>, map=0x7fdfe75e7010, action=SPDK_MEM_MAP_NOTIFY_REGISTER,              │        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007efdda1b1000)
    vaddr=0x700000000, len=2097152) at vtophys.c:372                                                                   │        /lib64/ld-linux-x86-64.so.2 (0x00007efdda9a7000)
#2  0x00007ffff74d478c in spdk_mem_register (vaddr=0x700000000, len=<optimized out>) at memory.c:360                   │(py36) lifen@sievert(:):~/Workspace/comanche$ls
#3  0x00000000004054c4 in Main::run (this=0x627420)                                                                    │-.002t.class   CHANGELOG.md           cscope.files   -.d         Doxyfile.in    LICENSE   README.md  tools
    at /home/lifen/Workspace/comanche/src/components/store/nvmestore/testing/mcas-nvmestore/testcli.cc:209             │apps           CMakeLists.txt         cscope.in.out  deployment  first-time.sh  mcas      src
#4  0x0000000000405909 in main (argc=3, argv=0x7fffffffe618)                                                           │build          compile_commands.json  cscope.out     deps        gtags.files    mk        testing
    at /home/lifen/Workspace/comanche/src/components/store/nvmestore/testing/mcas-nvmestore/testcli.cc:262  

Some docs about vfio: https://lwn.net/Articles/474088/ so when vfio is enabled, do different processes has the same view of "physical addresses"? if not, filling page table with physical addresses is meaningless!

~~I shall also read this (https://github.com/torvalds/linux/blob/master/Documentation/vfio.txt) Also a video(https://www.youtube.com/watch?v=WFkdTFTOTpA) and discussion(https://www.redhat.com/archives/vfio-users/2018-February/msg00010.html)~~ See https://github.com/fengggli/comanche/issues/8 for more details of vfio

fengggli commented 5 years ago
fengggli commented 5 years ago

virtual address is not registered in iommu, i needs to look into why the iocontrol doesn't work! Search using "spdk_mem_register site:https://lists.01.org/pipermail/spdk/"

spdk can be rebuild and reinstalled with

make DPDK_DIR=/home/lifen/Workspace/comanche/build/dist//share/dpdk/x86_64-native-linuxapp-gcc/ CONFIG_RDMA=y
make install DESTDIR=/home/lifen/Workspace/comanche/build/dist/ CONFIG_PREFIX=""
fengggli commented 5 years ago

From https://github.com/libhugetlbfs/libhugetlbfs/blob/master/HOWTO By default, when libhugetlbfs uses anonymous, unlinked hugetlbfs files to store remapped program segment data. This means that if the same program is started multiple times using hugepage segments, multiple huge pages will be used to store the same program data.

fengggli commented 5 years ago
fengggli commented 5 years ago


VM_RESERVED changed to VM_DONTEXPAND | VM_DONTDUMP
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=547b1e81afe3119f7daf702cc03b158495535a25
fengggli commented 5 years ago
fengggli commented 5 years ago

Trying:

  1. reproduce in minimal code and send to vfio group
    • code finished
  2. try dax memory see pg_reserved flag is set or not
    • Nope

In ( https://www.kernel.org/doc/gorman/html/understand/understand005.html):

PG_reserved This is set for pages that can never be swapped out. It is set by the boot memory allocator (See Chapter 5) for pages allocated during system startup. Later it is used to flag empty pages or ones that do not even exist
fengggli commented 5 years ago

mlx kernel rdma driver:

  1. https://www.spinics.net/lists/linux-rdma/msg33298.html
  2. https://community.mellanox.com/s/article/howto-implement-peerdirect-client-using-mlnx-ofed
fengggli commented 5 years ago

Problem is post in here https://www.redhat.com/archives/vfio-users/2019-July/msg00005.html I will come back when I got more useful info from them. The current status is:

  1. Not sure how to pass part of DMA memory to another process, so that it's also DMA-able.