Open Quuxplusone opened 4 years ago
I ran this unittest through valgrind, and observed that, when TMPDIR is set to one of the file systems which causes the program to hang as I described originally, the helgrind
threading tool in valgrind reports the following:
==12182== Thread #1: pthread_cond_destroy: destruction of condition variable being waited upon
==12182== at 0x4C34882: ??? (in /usr/lib64/valgrind/vgpreload_helgrind-amd64-linux.so)
==12182== by 0x40B665: (anonymous namespace)::VerifyingConsumer::~VerifyingConsumer() (llvm-project/clang/unittests/DirectoryWatcher/DirectoryWatcherTest.cpp:100)
==12182== by 0x40D629: DirectoryWatcherTest_DeleteWatchedDir_Test::TestBody() (llvm-project/clang/unittests/DirectoryWatcher/DirectoryWatcherTest.cpp:430)
==12182== by 0x54A633: void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (llvm-project/llvm/utils/unittest/googletest/src/gtest.cc:2402)
==12182== by 0x537A81: void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (llvm-project/llvm/utils/unittest/googletest/src/gtest.cc:2455)
==12182== by 0x525A25: testing::Test::Run() (llvm-project/llvm/utils/unittest/googletest/src/gtest.cc:2474)
==12182== by 0x5262BA: testing::TestInfo::Run() (llvm-project/llvm/utils/unittest/googletest/src/gtest.cc:2656)
==12182== by 0x526853: testing::TestCase::Run() (llvm-project/llvm/utils/unittest/googletest/src/gtest.cc:2774)
==12182== by 0x52BE24: testing::internal::UnitTestImpl::RunAllTests() (llvm-project/llvm/utils/unittest/googletest/src/gtest.cc:4649)
==12182== by 0x54DC33: bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) (llvm-project/llvm/utils/unittest/googletest/src/gtest.cc:2402)
==12182== by 0x5396E1: bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) (llvm-project/llvm/utils/unittest/googletest/src/gtest.cc:2455)
==12182== by 0x52BB29: testing::UnitTest::Run() (llvm-project/llvm/utils/unittest/googletest/src/gtest.cc:4257)
==12182==
It does not report that error when TMPDIR is set to a file system which allows the program to complete without hanging.
Greetings,
I am very close to having a working x86 flang buildbot - everything about the bot's workflow succeeds except for a single lit test, 'DirectoryWatcherTest'. This test reads and writes some temporary files/directories in TMPDIR.
On the system where this buildbot is running (a Cray XC40), I have observed the following behaviors (all results tested on commit 724bf4ee23a of llvm-project):
On "login nodes", the test runs without any issues. TMPDIR is set to
/tmp
, which is a ramdisk:However, the same test on an XC "compute node" hangs:
The program hangs on that last line and never returns. lldb shows the following:
I have been told that the /tmp ramdisk on "compute nodes" is configured slightly different than the /tmp ramdisk on "login nodes," which may explain the difference in behavior. But I don't know what the difference in configuration actually is.
I also tried setting TMPDIR to different file systems, including Lustre and GPFS. I found that when TMPDIR is a GPFS file system, the test succeeds on any kind of node (login and compute), but when TMPDIR is a Lustre filesystem, it hangs on both login nodes and compute nodes, in the same way as when TMPDIR is set to the /tmp ramdisk.
So to summarize:
/tmp ramdisk on XC login nodes: PASS Lustre on XC login nodes: (hangs) GPFS on XC login nodes: PASS
/tmp ramdisk on XC compute nodes: (hangs) Lustre on XC compute nodes: (hangs) GPFS on XC compute nodes: PASS
Any ideas how to make this test less sensitive to the kind of file system it's running on?
Thanks.