DARMA-tasking / vt

DARMA/vt => Virtual Transport
Other
35 stars 8 forks source link

#410: Dependent Epochs rewritten #2204

Open lifflander opened 10 months ago

lifflander commented 10 months ago

Fixes #410

github-actions[bot] commented 10 months ago

Pipelines results

PR tests (gcc-8, ubuntu, mpich, address sanitizer)

Build for 48f7a8c85c44240934ffc23842b64a808497ead1 (2023-10-31 19:56:21 UTC)

Compilation - successful

Testing - passed

Build log


PR tests (clang-9, ubuntu, mpich)

Build for 48f7a8c85c44240934ffc23842b64a808497ead1 (2023-10-31 19:56:21 UTC)

Compilation - successful

Testing - passed

Build log


PR tests (clang-11, ubuntu, mpich)

Build for 48f7a8c85c44240934ffc23842b64a808497ead1 (2023-10-31 19:56:21 UTC)

Compilation - successful

Testing - passed

Build log


PR tests (gcc-11, ubuntu, mpich, trace runtime, coverage)

Build for 48f7a8c85c44240934ffc23842b64a808497ead1 (2023-10-31 19:56:21 UTC)

Compilation - successful

Testing - passed

Build log


PR tests (gcc-10, ubuntu, openmpi, no LB)

Build for 48f7a8c85c44240934ffc23842b64a808497ead1 (2023-10-31 19:56:21 UTC)

Compilation - successful

Testing - passed

Build log


PR tests (clang-13, ubuntu, mpich)

Build for 48f7a8c85c44240934ffc23842b64a808497ead1 (2023-10-31 19:56:21 UTC)

Compilation - successful

Testing - passed

Build log


PR tests (gcc-9, ubuntu, mpich, zoltan, json schema test)

Build for 48f7a8c85c44240934ffc23842b64a808497ead1 (2023-10-31 19:56:21 UTC)

Compilation - successful

Testing - passed

Build log


PR tests (clang-12, ubuntu, mpich)

Build for 48f7a8c85c44240934ffc23842b64a808497ead1 (2023-10-31 19:56:21 UTC)

Compilation - successful

Testing - passed

Build log


PR tests (clang-14, ubuntu, mpich)

Build for 48f7a8c85c44240934ffc23842b64a808497ead1 (2023-10-31 19:56:21 UTC)

Compilation - successful

Testing - passed

Build log


PR tests (clang-10, ubuntu, mpich)

Build for 48f7a8c85c44240934ffc23842b64a808497ead1 (2023-10-31 19:56:21 UTC)

Compilation - successful

Testing - passed

Build log


PR tests (gcc-12, ubuntu, mpich)

Build for 48f7a8c85c44240934ffc23842b64a808497ead1 (2023-10-31 19:56:21 UTC)

Compilation - successful

Testing - passed

Build log


PR tests (intel icpc, ubuntu, mpich)

Build for 48f7a8c85c44240934ffc23842b64a808497ead1 (2023-10-31 19:56:21 UTC)

Compilation - successful

Testing - passed

Build log


PR tests (nvidia cuda 11.2, gcc-9, ubuntu, mpich)

Build for 48f7a8c85c44240934ffc23842b64a808497ead1 (2023-10-31 19:56:21 UTC)

/vt/src/vt/pipe/pipe_manager.impl.h(133): warning: missing return statement at end of non-void function "vt::pipe::PipeManager::makeSend<f,Target>(Target) [with f=&vt::vrt::collection::lb::GreedyLB::collectHandler, Target=vt::objgroup::proxy::ProxyElm<vt::vrt::collection::lb::GreedyLB>]"
          detected during:
            instantiation of "auto vt::pipe::PipeManager::makeSend<f,Target>(Target) [with f=&vt::vrt::collection::lb::GreedyLB::collectHandler, Target=vt::objgroup::proxy::ProxyElm<vt::vrt::collection::lb::GreedyLB>]" 
/vt/src/vt/objgroup/proxy/proxy_objgroup.impl.h(154): here
            instantiation of "vt::objgroup::proxy::Proxy<ObjT>::PendingSendType vt::objgroup::proxy::Proxy<ObjT>::reduce<f,Op,Target,Args...>(Target, Args &&...) const [with ObjT=vt::vrt::collection::lb::GreedyLB, f=&vt::vrt::collection::lb::GreedyLB::collectHandler, Op=vt::collective::PlusOp, Target=vt::objgroup::proxy::ProxyElm<vt::vrt::collection::lb::GreedyLB>, Args=<vt::vrt::collection::lb::GreedyPayload>]" 
/vt/src/vt/vrt/collection/balance/greedylb/greedylb.cc(222): here

/vt/src/vt/pipe/pipe_manager.impl.h(133): warning: missing return statement at end of non-void function "vt::pipe::PipeManager::makeSend<f,Target>(Target) [with f=&MyObj::handler, Target=vt::objgroup::proxy::ProxyElm<MyObj>]"
          detected during instantiation of "auto vt::pipe::PipeManager::makeSend<f,Target>(Target) [with f=&MyObj::handler, Target=vt::objgroup::proxy::ProxyElm<MyObj>]" 
/vt/examples/callback/callback.cc(147): here

/vt/src/vt/pipe/pipe_manager.impl.h(133): warning: missing return statement at end of non-void function "vt::pipe::PipeManager::makeSend<f,Target>(Target) [with f=&colHan, Target=vt::vrt::collection::VrtElmProxy<MyCol, vt::Index1D>]"
          detected during instantiation of "auto vt::pipe::PipeManager::makeSend<f,Target>(Target) [with f=&colHan, Target=vt::vrt::collection::VrtElmProxy<MyCol, vt::Index1D>]" 
/vt/examples/callback/callback.cc(153): here

/vt/src/vt/pipe/pipe_manager.impl.h(133): warning: missing return statement at end of non-void function "vt::pipe::PipeManager::makeSend<f,Target>(Target) [with f=&MyObj::handler, Target=vt::objgroup::proxy::ProxyElm<MyObj>]"
          detected during instantiation of "auto vt::pipe::PipeManager::makeSend<f,Target>(Target) [with f=&MyObj::handler, Target=vt::objgroup::proxy::ProxyElm<MyObj>]" 
/vt/examples/callback/callback.cc(147): here

/vt/src/vt/pipe/pipe_manager.impl.h(133): warning: missing return statement at end of non-void function "vt::pipe::PipeManager::makeSend<f,Target>(Target) [with f=&colHan, Target=vt::vrt::collection::VrtElmProxy<MyCol, vt::Index1D>]"
          detected during instantiation of "auto vt::pipe::PipeManager::makeSend<f,Target>(Target) [with f=&colHan, Target=vt::vrt::collection::VrtElmProxy<MyCol, vt::Index1D>]" 
/vt/examples/callback/callback.cc(153%0D%0A%0D%0A%0D%0A ==> And there is more. Read log. <==

Build log


PhilMiller commented 10 months ago

So, the one big question I have in reviewing this is: why the move to releasing epochs per-collection/objgroup/object?

Everything else I've marked in comments is pretty minor, and can/should be addressed after writing up an answer for the big question.

Also, squashing fixup commits into their progenitors would be good. Or, even rewriting the history entirely to provide a clearer to understand narrative sequence of commits.

Matthew-Whitlock commented 10 months ago

It looks like creating an epoch within a dependent epoch circumvents waiting for release on all of the messages within the nested epoch. As is, I don't think we could make any assertions about the behavior of general code like this:

auto epoch = vt::theTerm()->makeEpochCollective(term::ParentEpochCapture{}, true);
vt::theMsg()->pushEpoch(epoch);

some::library::function();

vt::theMsg()->popEpoch(epoch);
vt::theTerm()->finishedEpoch(epoch);
lifflander commented 10 months ago

So, the one big question I have in reviewing this is: why the move to releasing epochs per-collection/objgroup/object?

The short answer to this question is that it will allow us to express dependencies per-object. An object can release work when it's ready, which is a per-object state.

Everything else I've marked in comments is pretty minor, and can/should be addressed after writing up an answer for the big question.

Also, squashing fixup commits into their progenitors would be good. Or, even rewriting the history entirely to provide a clearer to understand narrative sequence of commits.

Yes, I am planning on re-writing the history.

lifflander commented 10 months ago

@PhilMiller @Matthew-Whitlock Please take a look at the follow-on PR #2206. I nearly have the new abstraction working on top of dependent epochs called taskCollective. Here is the example Jacobi program. This might clarify why the dependencies are per-object.

https://github.com/DARMA-tasking/vt/blob/99a63d99a7d156cafae33d157e56c2aea3145986/examples/collection/jacobi1d_vt.cc#L233-L284

PhilMiller commented 10 months ago

Ok, I'll take a look. That maybe answers my follow-up question of what it means to release an epoch for some object/collection, but not for others.

lifflander commented 10 months ago

Ok, I'll take a look. That maybe answers my follow-up question of what it means to release an epoch for some object/collection, but not for others.

@Matthew-Whitlock I have the 2-D Jacobi correctly running now with asynchronous iterations. It runs faster (even with the dependency resolution) than the synchronized version.

https://github.com/DARMA-tasking/vt/blob/eb972fb579c21ce5f3b463c6e2b68ac014337288/examples/collection/jacobi2d_vt.cc#L359-L412