eth-cscs / DLA-Future

DLA-Future
https://eth-cscs.github.io/DLA-Future/master/
BSD 3-Clause "New" or "Revised" License
64 stars 14 forks source link

Fix wrong bounds of and concurrent access to counter on pipeline test #1072

Closed msimberg closed 9 months ago

msimberg commented 9 months ago

In the particular test three read-only accesses to a pipeline are started concurrently using start_detached. Each access increments a counter, and the order in which they are incremented depends on when tasks are scheduled (since read-only access may happen concurrently). Thus the expected value of the counter in each of the three tasks is between 27 and 29. One of the tasks incorrectly had the bounds as 27 to 27 (i.e. exactly 27). Since this was the first task this would frequently be true, but is not guaranteed to be true.

Additionally, the modification of the counter in read-only sections was unprotected (and the counter was not atomic). This adds a lock to protect access to the counter.

msimberg commented 9 months ago

cscs-ci run

msimberg commented 9 months ago

cscs-ci run

rasolca commented 9 months ago

Note as you are already fixing test_pipeline.cpp:

test/unit/common/test_pipeline.cpp:764:24: warning: implicit conversion loses integer precision: 'std::size_t' (aka 'unsigned long') to 'std::mersenne_twister_engine<unsigned int, 32, 624, 397, 31, 2567483615, 11, 4294967295, 7, 2636928640, 15, 4022730752, 18, 1812433253>::result_type' (aka 'unsigned int') [-Wshorten-64-to-32]
std::mt19937 gen(seed);

(with clang14)

msimberg commented 9 months ago

cscs-ci run

msimberg commented 9 months ago

Note as you are already fixing test_pipeline.cpp:

test/unit/common/test_pipeline.cpp:764:24: warning: implicit conversion loses integer precision: 'std::size_t' (aka 'unsigned long') to 'std::mersenne_twister_engine<unsigned int, 32, 624, 397, 31, 2567483615, 11, 4294967295, 7, 2636928640, 15, 4022730752, 18, 1812433253>::result_type' (aka 'unsigned int') [-Wshorten-64-to-32]
std::mt19937 gen(seed);

(with clang14)

Do you see that in CI? If yes, are we again missing warnings-as-errors?

msimberg commented 9 months ago

The test seems to fail a bit more reliably with the sleeps introduced in https://github.com/eth-cscs/DLA-Future/commit/daa86807705d8f725d1c256a09b90afe1607ef7f: https://gitlab.com/cscs-ci/ci-testing/webhook-ci/mirrors/4700071344751697/7514005670787789/-/pipelines/1132292952.

@rasolca I've also changed the seeding of the generator to something that I think will silence the warning.

msimberg commented 9 months ago

cscs-ci run

msimberg commented 9 months ago

There was an unrelated failure again waiting for a job to finish: https://gitlab.com/cscs-ci/ci-testing/webhook-ci/mirrors/4700071344751697/7514005670787789/-/jobs/5894331404. I've restarted the job.

msimberg commented 9 months ago

cscs-ci run

msimberg commented 9 months ago

@rasolca I've changed some, but not all, thens to transforms. In some of the tests I think it's still useful to rely on the inline execution of the continuations, and like this we hopefully get a happy mix of both inline and concurrent execution with tasks.

rasolca commented 9 months ago

Note as you are already fixing test_pipeline.cpp:

test/unit/common/test_pipeline.cpp:764:24: warning: implicit conversion loses integer precision: 'std::size_t' (aka 'unsigned long') to 'std::mersenne_twister_engine<unsigned int, 32, 624, 397, 31, 2567483615, 11, 4294967295, 7, 2636928640, 15, 4022730752, 18, 1812433253>::result_type' (aka 'unsigned int') [-Wshorten-64-to-32]
std::mt19937 gen(seed);

(with clang14)

Do you see that in CI? If yes, are we again missing warnings-as-errors?

No, I was getting it on my laptop.