SoftRoCE / rxe-dev

Development Repository for RXE
Other
130 stars 56 forks source link

VM freeze when run ibv_rc_pingpong command #67

Closed zhangguoqing closed 7 years ago

zhangguoqing commented 7 years ago

I have flow the [1] to install the ib_rxe and librxe in a Virtual Machine which in my OpenStack env and the VM operating system is CentOS7. [1] https://github.com/SoftRoCE/rxe-dev/wiki/rxe-dev:-Home There are all seems OK: rxe_cfg status Name Link Driver Speed NMTU IPv4_addr RDEV RMTU eth0 yes virtio_net rxe0 1024 (3)

ibv_devices device node GUID


rxe0                f8163efffeed6199

But if I run the command 'ibv_rc_pingpong -d rxe0 -g 0', the VM will stop means that I can do anything likes freeze. I trace it as flow: ... .... .... .... .... open("/lib64/libi40iw-rdmav2.so", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\300\23\0\0\0\0\0\0"..., 832) = 832 fstat(3, {st_mode=S_IFREG|0755, st_size=32672, ...}) = 0 mmap(NULL, 2126280, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f82ea541000 mprotect(0x7f82ea547000, 2097152, PROT_NONE) = 0 mmap(0x7f82ea747000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x6000) = 0x7f82ea747000 close(3) = 0 mprotect(0x7f82ea747000, 4096, PROT_READ) = 0 munmap(0x7f82ec55d000, 34170) = 0 open("/sys/class/infiniband_verbs/uverbs0/ibdev", O_RDONLY|O_CLOEXEC) = 3 read(3, "rxe0\n", 16) = 5 close(3) = 0 open("/sys/class/infiniband/rxe0/node_type", O_RDONLY|O_CLOEXEC) = 3 read(3, "1: CA\n", 8) = 6 close(3) = 0 futex(0x7f82ec347534, FUTEX_WAKE_PRIVATE, 2147483647) = 0 open("/dev/infiniband/uverbs0", O_RDWR|O_CLOEXEC) = 3 write(3, "\0\0\0\0\4\0\2\0\340a/A\375\177\0\0", 16) = 16 write(3, "\3\0\0\0\4\0\1\0\20b/A\375\177\0\0", 16) = 16 write(3, "\t\0\0\0\f\0\3\0\300a/A\375\177\0\0\0P\362\0\0\0\0\0\0\20\0\0\0\0\0\0"..., 48

VM freeze here and the ssh disconnection. packet_write_wait: Connection to 172.25.11.223 port 22: Broken pipe

And other things is that I have seem another VM and do not have the above appearance. I also strace the command as follow for you comparison. ... .... .... .... .... open("/lib64/libi40iw-rdmav2.so", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\300\23\0\0\0\0\0\0"..., 832) = 832 fstat(3, {st_mode=S_IFREG|0755, st_size=32672, ...}) = 0 mmap(NULL, 2126280, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f75d40bc000 mprotect(0x7f75d40c2000, 2097152, PROT_NONE) = 0 mmap(0x7f75d42c2000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x6000) = 0x7f75d42c2000 close(3) = 0 mprotect(0x7f75d42c2000, 4096, PROT_READ) = 0 munmap(0x7f75d689c000, 93822) = 0 open("/sys/class/infiniband_verbs/uverbs0/ibdev", O_RDONLY|O_CLOEXEC) = 3 read(3, "rxe0\n", 16) = 5 close(3) = 0 open("/sys/class/infiniband/rxe0/node_type", O_RDONLY|O_CLOEXEC) = 3 read(3, "1: CA\n", 8) = 6 close(3) = 0 futex(0x7f75d6694534, FUTEX_WAKE_PRIVATE, 2147483647) = 0 open("/dev/infiniband/uverbs0", O_RDWR|O_CLOEXEC) = 3 write(3, "\0\0\0\0\4\0\2\0\260\345\232\220\375\177\0\0", 16) = 16 write(3, "\3\0\0\0\4\0\1\0\340\345\232\220\375\177\0\0", 16) = 16 write(3, "\t\0\0\0\f\0\3\0\220\345\232\220\375\177\0\0\0\200\302\1\0\0\0\0\0\20\0\0\0\0\0\0"..., 48) = 48 write(3, "\22\0\0\0\n\0\6\0p\345\232\220\375\177\0\0\300|\302\1\0\0\0\0\365\1\0\0\0\0\0\0"..., 40) = 40 mmap(NULL, 36864, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0x27a000) = 0x7f75d68aa000 write(3, "\30\0\0\0\20\0\20\0\240\345\232\220\375\177\0\0p}\302\1\0\0\0\0\0\0\0\0\0\0\0\0"..., 64) = 64 mmap(NULL, 36864, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0x284000) = 0x7f75d68a1000 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0x283000) = 0x7f75d68a0000 write(3, "\31\0\0\0\6\0 \0 \345\232\220\375\177\0\0\0\0\0\0\0\0\10\0", 24) = 24 write(3, "\32\0\0\0\36\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 120) = 120 write(3, "\2\0\0\0\6\0\n\0p\345\232\220\375\177\0\0\1\0\0\0\0\0\0\0", 24) = 24 xxxxxxxxxxxxxxxxxxxx Segmentation Flag, For your attention xxxxxxxxxxxxxxxxxxxx open("/sys/class/infiniband/rxe0/ports/1/gids/0", O_RDONLY|O_CLOEXEC) = 5 read(5, "fe80:0000:0000:0000:f816:3eff:fe"..., 41) = 40 close(5) = 0 fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f75d689f000 write(1, " local address: LID 0x0000, QP"..., 88 local address: LID 0x0000, QPN 0x000012, PSN 0x5078f2, GID fe80::f816:3eff:feed:6199 ) = 88 open("/etc/gai.conf", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) futex(0x7f75d5bd0460, FUTEX_WAKE_PRIVATE, 2147483647) = 0 socket(PF_NETLINK, SOCK_RAW, 0) = 5 bind(5, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 0 getsockname(5, {sa_family=AF_NETLINK, pid=4833, groups=00000000}, [12]) = 0 sendto(5, "\24\0\0\0\26\0\1\3(s)Y\0\0\0\0\0\0\0\0", 20, 0, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 20 recvmsg(5, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"L\0\0\0\24\0\2\0(s)Y\341\22\0\0\2\10\200\376\1\0\0\0\10\0\1\0\177\0\0\1"..., 4096}], msg_controllen=0, msg_flags=0}, 0) = 252 recvmsg(5, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"H\0\0\0\24\0\2\0(s)Y\341\22\0\0\n\200\200\376\1\0\0\0\24\0\1\0\0\0\0\0"..., 4096}], msg_controllen=0, msg_flags=0}, 0) = 144 recvmsg(5, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\24\0\0\0\3\0\2\0(s)Y\341\22\0\0\0\0\0\0", 4096}], msg_controllen=0, msg_flags=0}, 0) = 20 socket(PF_LOCAL, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 6 connect(6, {sa_family=AF_LOCAL, sun_path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory) close(6) = 0 close(5) = 0 socket(PF_INET6, SOCK_DGRAM, IPPROTO_IP) = 5 connect(5, {sa_family=AF_INET6, sin6_port=htons(18515), inet_pton(AF_INET6, "::", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = 0 getsockname(5, {sa_family=AF_INET6, sin6_port=htons(44811), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0 connect(5, {sa_family=AF_UNSPEC, sa_data="\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 16) = 0 connect(5, {sa_family=AF_INET, sin_port=htons(18515), sin_addr=inet_addr("0.0.0.0")}, 16) = 0 getsockname(5, {sa_family=AF_INET6, sin6_port=htons(45984), inet_pton(AF_INET6, "::ffff:127.0.0.1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0 close(5) = 0 socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 5 setsockopt(5, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 bind(5, {sa_family=AF_INET, sin_port=htons(18515), sin_addr=inet_addr("0.0.0.0")}, 16) = 0 listen(5, 1) = 0 accept(5,

zhangguoqing commented 7 years ago

I seems find that if the VM's memory is 4GB or more than, all of them will be OK. Thanks. :)