Closed CCrainys closed 5 years ago
The NIC port can be changed using the last argument to the Rpc constructor. See docs.
The server failure handing feature isn't ready yet. I've disabled its test for now.
Can you try multi_process_test with a clean initial state (i.e., without server_failure_test having first failed)?
Hi, Anuj Thanks for your quick reply. Firstly, I pull your last commit(disable server failure). Then I run ctest. I found that sometimes multi_process_test run successfully, sometimes it fails.
100% tests passed, 0 tests failed out of 16
Total Test time (real) = 44.86 sec
or
Total Test time (real) = 66.34 sec
The following tests FAILED:
8 - multi_process_test (OTHER_FAULT)
Errors while running CTest
when multi_process_test failed, I just run build/multi_process_test alone, the error info is
Process 4: All sessions connected
Process 13: All sessions connected
Process 22: All sessions connected
multi_process_test: /root/eRPC/tests/client_tests/multi_process_test.cc:60: void
process_proxy_thread_func(size_t, size_t): Assertion `c.num_rpc_resps == num_processes - 1'
failed.
Aborted
Hi, Anuj
I read code about multi_process_test. I think the problem was caused by variable kMaxNumERpcProcesses and kTestMaxEventLoopMs. After increasing kTestMaxEventLoopMs or decreasing kMaxNumERpcProcesses, multi_process_test works successfully.
My cluster has 28cores, 16 cores were used by other task. I think it might be a multi-process scheduling problem when number of free CPU cores on the system is less than value of kMaxNumERpcProcesses.
The above is my guess. Do you agree with me? Looking forward for your reply.
Best regards Thomas
Sounds right. The hardcoded value of allowed test time (kTestMaxEventLoopMs
) isn't good, and I will move to more flexible timing in the future.
OK, got it.
thanks
Best Regards, Thomas
From: anujkaliaiitd notifications@github.com Sent: Tuesday, February 26, 2019 1:41:58 AM To: erpc-io/eRPC Cc: Thomas CC; Author Subject: Re: [erpc-io/eRPC] test failed in server_failure_test&multi_process_test (#21)
Sounds right. The hardcoded value of allowed test time (kTestMaxEventLoopMs) isn't good, and I will move to more flexible timing in the future.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/erpc-io/eRPC/issues/21#issuecomment-467106406, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AR1ZlhjDKj2797lyRQNPAKZjZfm2D5_yks5vRCBmgaJpZM4bO-xD.
Hi, Anuj My cluster has 4 mellanox connectx-4 nics: ib0 and ib1 are infiniband nics. p6p1 and p6p2 are ethernet nics.
Ofed version is :
Operating system is:
I have two questions:
2.I compile with command "cmake . -DPERF=OFF -DTRANSPORT=raw", then run ctest. However, server_failure_test and multi_process_test failed, the error info is:
I run build/server_failure_test and build/multi_process_test, the error information is:
server_failure_test:
multi_process_test:
Looking forward for your reply and thanks in advance.
Best regards Thomas