ComputationalRadiationPhysics / redGrapes

Resource-based, Declarative task-Graphs for Parallel, Event-driven Scheduling :grapes:
https://redgrapes.rtfd.io
Mozilla Public License 2.0
20 stars 5 forks source link

redGrapes is currently broken :-( #49

Closed psychocoderHPC closed 9 months ago

psychocoderHPC commented 9 months ago

The dev branch and upcoming fixes #48 fail the random graph test with slide changes.

Changin: https://github.com/ComputationalRadiationPhysics/redGrapes/blob/5f71af63d5cbf2bc51ae795784ad42e133aa9f25/test/random_graph.cpp#L39-L45 to

std::chrono::microseconds task_duration(5);
unsigned n_resources = 16;
unsigned n_tasks = 128;
unsigned n_threads = 4;
unsigned min_dependencies = 1;
unsigned max_dependencies = 10;
std::mt19937 gen;

will result into:

1: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1: redGrapes_test is a Catch2 v3.3.2 host application.
1: Run with -? for options
1: 
1: --------------------------------[thread 15476] [info] steal task for worker 3
1: -----------------------------------------------
1: RandomGraph
1: -----------------------------------[thread 15476] [info] steal task for worker 3
1: --------------------------------------------
1: /home/widera/workspace/redGrapes/test/random_graph.cpp:95
1: ...............................................................................
1: 
1: /home/widera/workspace/redGrapes/test/random_graph.cpp:186: FAILED:
1:   REQUIRE( *resources[i] == expected_hash[i] )
1: [thread 15474] [info] steal task for worker 1
1: with expansion:
1:   { 563 (0x233), 0, 0, 0, 0, 0, 0, 0 }
1:   ==
1:   { 2632 (0xa48), 0, 0, 0, 0, 0, 0, 0 }
1: 
1: ===============================================================================
1: test cases: 1 | 1 failed
1: assertions: 1 | 1 failed
1: 
1: [thread 15472] [warning] BumpAllocChunk: 1 allocations remaining not deallocated.
1: [thread 15472] [warning] BumpAllocChunk: 1 allocations remaining not deallocated.
1: [thread 15472] [warning] BumpAllocChunk: 1 allocations remaining not deallocated.
1: [thread 15472] [warning] BumpAllocChunk: 1 allocations remaining not deallocated.
1/1 Test #1: unittest .........................***Failed    0.02 sec

0% tests passed, 1 tests failed out of 1

Total Test time (real) =   0.02 sec
psychocoderHPC commented 9 months ago

Another more simple config which is broken and uses only one thread

std::chrono::microseconds task_duration(2);
unsigned n_resources = 2;
unsigned n_tasks = 128;
unsigned n_threads = 1;
unsigned min_dependencies = 0;
unsigned max_dependencies = 10;

Increasing max_dependencies is starting to trigger the error. The problem should be debug-able because 2 tasks already trigger the issue.

psychocoderHPC commented 9 months ago

Could be that random graph test only supports up to 5 dependencies and that this is the reason for the error.

https://github.com/ComputationalRadiationPhysics/redGrapes/blob/5f71af63d5cbf2bc51ae795784ad42e133aa9f25/test/random_graph.cpp#L109-L181

psychocoderHPC commented 9 months ago

The test does not support more than 5 resources.

michaelsippel commented 9 months ago

The test does not support more than 5 resources.

Yes thats why the verification fails. generate_access_pattern() will create a task graph with up to 10 dependencies but all tasks with >5 dependencies are simply ignored by the execution of this access pattern.

At least we should add asserts, but otherwise I don't see any other way to allow more dependencies , except we would need some macro magic to generate all cases up to a compile-time-defined number of max-dependencies..

psychocoderHPC commented 9 months ago

At least we should add asserts, but otherwise I don't see any other way to allow more dependencies , except we would need some macro magic to generate all cases up to a compile-time-defined number of max-dependencies..

Macros are not required as soon as we switch to C++17 we can use generator expressions for this.