Closed seanm closed 2 weeks ago
I had asked about RUNS_LONG
during the hackathon yesterday, and understood that that doesn't actually make a test run alone by itself.
The difficulty is that some tests themselves are highly multithreaded and if run with other tests via set(CTEST_TEST_ARGS PARALLEL_LEVEL 10)
then everything gets slow, and timeouts are hit.
I'm open to other ways to get these passing https://open.cdash.org/viewTest.php?onlyfailed&buildid=10165388 instead of timing out...
The difficulty is that some tests themselves are highly multithreaded and if run with other tests via
set(CTEST_TEST_ARGS PARALLEL_LEVEL 10)
then everything gets slow, and timeouts are hit.
Agreed. That is why I set ITK_GLOBAL_DEFAULT_NUMBER_OF_THREADS to 2 for the valgrind build.
I deal with this the same issues on clusters concurrently batch processing images. It is more efficient CPU utilization to process more tasks (tests) with fewer threads, than a few tasks(tests) with many threads. They are different approaches which require different approaches. Bad things happen when you launch 48 tasks which each try to get 48 threads.
As I said these test are already marked long, and this tunning will not effect the already tuned valgrind build.
FYI: My informal testing of ITK algorithms with MRI brain imaging (i.e. 40mb images of brains for registration and filtering operations). This indicated that multithreading for that data size improved significantly up to 4 threads, improved some up to 6 threads, and then only marginally improved to about 10 threads when performance often started to decline.
The correct number of threads is heavily data and algorithm dependant. The "rules of thumb" above have served me well for getting most of the value out of multithreading without doing extensive or definitive optimizations.
FYI: I have used test fixtures to address these types of issues in other projects:
https://crascit.com/2016/10/18/test-fixtures-with-cmake-ctest/
Agreed. That is why I set ITK_GLOBAL_DEFAULT_NUMBER_OF_THREADS to 2 for the valgrind build.
Oh that's a good idea. I can try that. Is ITK_DEFAULT_MAX_THREADS
basically the same? One is a cmake setting, the other is an env var?
ITK_DEFAULT_MAX_THREADS
Oh that's a good idea. I can try that. Is
ITK_DEFAULT_MAX_THREADS
basically the same? One is a cmake setting, the other is an env var?
For the purpose of testing I think the effect will be the same, but with the cmake variable set and built in, the binaries will never be able to use more threads than that. ( Historically this was the size of an array which kept track of the threads as I recall)
Yes. EnvVar has lowest priority.
OK I set ITK_DEFAULT_MAX_THREADS=2
on Rogue19. I've left CTEST_TEST_ARGS PARALLEL_LEVEL 8
. It is an 8 core Apple M1. We'll see cdash tomorrow...
Unexpectedly, Rogue19 has more test failures today: https://open.cdash.org/viewTest.php?onlyfailed&buildid=10167705 Example:
593: Test command: /Users/builder/external/ITK-AppleClang-dbg-Universal/bin/ITKCommon2TestDriver "--remove-env" "ITK_GLOBAL_DEFAULT_NUMBER_OF_THREADS" "itkMultiThreadingEnvironmentTest" "88"
593: Working Directory: /Users/builder/external/ITK-AppleClang-dbg-Universal/Modules/Core/Common/test
593: Environment variables:
593: NSLOTS=88
593: Test timeout computed to be: 900
593: ERROR: Wrong number of maximum number of threads set from environment. 88 != 2
1/1 Test #593: itkMultiThreadingEnvTest88 .......***Failed 0.08 sec
The tests look wrong to me. They seem to assume that ITK_DEFAULT_MAX_THREADS
can't be small.
WIP patch: https://github.com/InsightSoftwareConsortium/ITK/pull/5191
For reference here is the run for the nightly Valgrind, that is publicly managed in the SimpleITK account: https://github.com/SimpleITK/SimpleITK/actions/runs/12944566489/job/36105670713 https://github.com/SimpleITK/SimpleITK/blob/master/.github/workflows/Nightly.yml#L111-L145
I believe these tests are already excluded by the above script as they are marked as "RUNS_LONG". For the valgrind I explicitly set the ITK number of threads to only 2, and use a many core system to try to run valgrind as efficiently as possible. The changes do look fine, but in general I have a concern about to many serial/resource lock tests decreasing the efficiency of a many cored system.