DARMA-tasking / vt

DARMA/vt => Virtual Transport
Other
33 stars 8 forks source link

Meeting Agenda [do not close] #925

Open lifflander opened 3 years ago

lifflander commented 3 years ago

This issue shall be used to log (weekly) meeting agendas and the corresponding resolution of the topics addressed (minutes). This will help us maintain a record of topics discussed and the resolution of these issues. Each comment shall contain a meeting agenda for a given week, edited later with the resolution for each point of order.

Other impromptu meetings that are relevant to the whole group can also be logged here for posterity and members of the team (or people following the project) who couldn't join.

Rendered template for each meeting:

Descriptor Information
Date [xx/xx/xxx]
Attendees [list-of-attendees]
Description [longer-description]

Agenda:

Template:

| Descriptor | Information |
| --: | --------- |
| Date | [xx/xx/xxx] |
| Attendees | [list-of-attendees] |
| Description | [longer-description] |

### Agenda:
- Item 1
- Item 2
lifflander commented 3 years ago
Descriptor Information
Date 07/14/2020 @ 11am
Attendees @lifflander, @ppebay, @pnstickne, @cz4rs, @nlslatt, @PhilMiller
Description Weekly Meeting

Agenda:

Minutes

lifflander commented 3 years ago
Descriptor Information
Date 07/21/2020 @ 11am
Attendees @lifflander, @ppebay, @pnstickne, @cz4rs, @nlslatt, @PhilMiller, @nmm0
Description Weekly Meeting

Agenda:

Minutes

lifflander commented 3 years ago
Descriptor Information
Date 07/28/2020 @ 11am
Attendees @lifflander, @ppebay, @pnstickne, @cz4rs, @nlslatt, @PhilMiller, @JacobDomagala, @nmm0
Description Weekly Meeting

Agenda:

Minutes

lifflander commented 3 years ago
Descriptor Information
Date 08/04/2020 @ 11am
Attendees @lifflander,
Description Weekly Meeting

Agenda:

Minutes

lifflander commented 3 years ago
Descriptor Information
Date 08/18/2020 @ 11am
Attendees @lifflander, @nlslatt, @nmm0 @PhilMiller, @cz4rs, @JacobDomagala @jstrzebonski, @pnstickne
Description Weekly Meeting

Agenda:

Minutes

Phil: reviewing PRs, bugs found from addAction changes, another fix needed for 1.0.0-beta.10 Nic: working on PR, addressing Phil's comments Cezary: finished #984, working on #989 Jakub D:finishing #955, starting #959 Jakub S: just did dev checklist, starting on #880 Paul: working on #969

lifflander commented 3 years ago
Descriptor Information
Date 08/25/2020 @ 11am
Attendees @lifflander, @nlslatt, @nmm0 @PhilMiller, @cz4rs, @JacobDomagala, @pnstickne, @ppebay
Description Weekly Meeting

Agenda:

Minutes

Phil: working on paper, #995 Nic: finishing up #714, going to work on cmake next Cezary: just merged #996, working on #1000 Jakub D: working on testing load models #1001 Paul: both #998 and #969 are ready to merge

lifflander commented 3 years ago
Descriptor Information
Date 09/01/2020 @ 11am
Attendees @lifflander, @nlslatt, @nmm0, @PhilMiller, @cz4rs, @JacobDomagala, @pnstickne, @ppebay
Description Weekly Meeting

Agenda:

Minutes

Nic: going to work on paper, NimbleSM section Phil: memory issues with load models Cezary: working on #1000 and #1013 Jakub D: finishing load modeling PR Paul: MsgThief is complete, work next on broadcast to sender

lifflander commented 3 years ago
Descriptor Information
Date 09/08/2020 @ 11am
Attendees @lifflander, @nlslatt, @nmm0, @PhilMiller, @cz4rs, @JacobDomagala, @pnstickne, @ppebay, @bradybray
Description Weekly Meeting

Agenda:

Minutes

Phil: working on paper all week Nic: working on paper all week Nicole: nothing on VT Cezary: working on #1000 and #1013 Jakub D: adding signal handler #1027, finished #1012 ready for review Jakub S: working on #1014, almost done, some questions about it Paul: added lock to the envelope #1016 Braden: changed nomenclature on #1026

lifflander commented 3 years ago
Descriptor Information
Date 09/15/2020 @ 11am
Attendees @lifflander, @nlslatt, @nmm0, @PhilMiller, @cz4rs, @JacobDomagala, @pnstickne, @ppebay
Description Weekly Meeting

Agenda:

Minutes

Phil: working on improving LB infrastructure Nic: working on paper with comparison for NimbleSM, LB next Nicole: nothing on VT Cezary: working on checkpoint #123 and #1013 Jakub D: adding signal handler #1027, finished #1012 ready for review Jakub S: finishing #1014, nearly finished with #1031 Paul: added lock to the envelope #1016 Braden: changed nomenclature on #1026

lifflander commented 3 years ago
Descriptor Information
Date 09/22/2020 @ 11am
Attendees @lifflander, @nlslatt, @nmm0, @PhilMiller, @cz4rs, @JacobDomagala, @pnstickne, @ppebay, @jstrzebonski, @bradybray
Description Weekly Meeting

Agenda:

Minutes

Phil: planned work on GEMMA, finishing up LB #1055 Nic: newer implementation is slower, cmake GIt is causing re-runs #1005 Nicole: VT calls MPI_Init multiple times, opening issue for this Cezary: new issue in checkpoint https://github.com/DARMA-tasking/checkpoint/issues/127, in support #1009 Jakub D: finishing up #1024 Jakub S: small change to PR check: https://github.com/DARMA-tasking/check-pr-fixes-issue/pull/3 Paul: working on local invocation #1051 Braden: finished training! starting #941

lifflander commented 3 years ago
Descriptor Information
Date 09/29/2020 @ 11am
Attendees @lifflander, @nlslatt, @nmm0, @PhilMiller, @cz4rs, @JacobDomagala, @pnstickne, @ppebay, @jstrzebonski, @bradybray
Description Weekly Meeting

Agenda:

Minutes

Phil: Nic: Nicole: Cezary: Jakub D: Jakub S: Paul: Braden:

lifflander commented 3 years ago
Descriptor Information
Date 10/13/2020 @ 11am
Attendees @lifflander, @nlslatt, @nmm0, @PhilMiller, @cz4rs, @JacobDomagala, @jstrzebonski, @bradybray
Description Weekly Meeting

Agenda:

Minutes

@PhilMiller : working on checkpoint https://github.com/DARMA-tasking/checkpoint/issues/138, finished https://github.com/DARMA-tasking/checkpoint/issues/132 @nmm0 : working on cmake; on vacation @nlslatt : invalid count in EMPIRE @cz4rs : footprinting in checkpoint is done; working on component serialize https://github.com/DARMA-tasking/vt/pull/1013 @JacobDomagala : mostly fixing on assertions on in release mode for CI builds (https://github.com/DARMA-tasking/vt/pull/1107 and https://github.com/DARMA-tasking/vt/pull/1109); working on removing old LB instrumentation (https://github.com/DARMA-tasking/vt/pull/1108) @jstrzebonski : working on encoding member bits in handler field (https://github.com/DARMA-tasking/vt/issues/941) @bradybray : working on https://github.com/DARMA-tasking/vt/pull/1081 with Phil, need to fix rebase

lifflander commented 3 years ago
Descriptor Information
Date 10/20/2020 @ 11am
Attendees @lifflander, @nmm0, @PhilMiller, @cz4rs, @JacobDomagala, @jstrzebonski, @bradybray, @ppebay, @nlslatt
Description Weekly Meeting

Agenda:

Minutes

@PhilMiller : working on new functionality in EMPIRE for DA, found bug in checkpoint; working on presentations @nmm0 : nothing to report on VT; optimizing NimbleSM @nlslatt: fixed bug in checkpoint @cz4rs : fix in checkpoint integrated https://github.com/DARMA-tasking/checkpoint/pull/144; working on component serialize https://github.com/DARMA-tasking/vt/pull/1013 @JacobDomagala : working on #1112, PR #1119 @jstrzebonski : working on encoding member bits in handler field #872 (https://github.com/DARMA-tasking/vt/issues/941) @bradybray : #941 is merged

lifflander commented 3 years ago
Descriptor Information
Date 10/27/2020 @ 11am
Attendees @lifflander, @nmm0, @PhilMiller, @cz4rs, @JacobDomagala, @jstrzebonski, @bradybray, @nlslatt, @fnrizzi
Description Weekly Meeting

Agenda:

Minutes

@PhilMiller : helped fix type traits bug in checkpoint, improvements to virtual serialize, kokkos view fixes @nmm0 : finishing up cmake re-run issues, should have PR soon @nlslatt: reviewing VT PRs, learning about VT code; working on LB stats @cz4rs : some ideas in https://github.com/DARMA-tasking/checkpoint/issues/125 ; memory footprinting ready to review; need help serializing templated containers @JacobDomagala : working on #1119 for callbacks overload, pushed update; started on #1051 @jstrzebonski : finishing encoding member bits in handler field #872 (https://github.com/DARMA-tasking/vt/issues/941) @bradybray : pull request coming for name changes

lifflander commented 3 years ago
Descriptor Information
Date 11/03/2020 @ 11am
Attendees @lifflander, @nmm0, @PhilMiller, @cz4rs, @JacobDomagala, @jstrzebonski, @bradybray, @nlslatt, @fnrizzi
Description Weekly Meeting

Agenda:

Minutes

@PhilMiller : focused on https://github.com/DARMA-tasking/checkpoint/issues/151 @nmm0 : finishing up cmake re-run issues, should have PR today on that @nlslatt: working on #1131 (reduce ElementStats data storage), thus #1122 @cz4rs : working on footprinting #1013; virtual serialization causing failures in runs; finishing up #1125 @JacobDomagala : working mostly on #1051; going to start on #883 @jstrzebonski : finished encoding member bits in handler field #872 (https://github.com/DARMA-tasking/vt/issues/941) @bradybray : not much new to report

lifflander commented 3 years ago
Descriptor Information
Date 11/10/2020 @ 11am
Attendees @lifflander, @nmm0, @PhilMiller, @cz4rs, @JacobDomagala, @jstrzebonski, @bradybray, @nlslatt, @fnrizzi
Description Weekly Meeting

Agenda:

Minutes

@PhilMiller : focused on checkpoint, virtual serialization support in-place de-serialization: https://github.com/DARMA-tasking/checkpoint/issues/151 ; created new issue for checks https://github.com/DARMA-tasking/checkpoint/issues/154 @nmm0 : finishing up https://github.com/DARMA-tasking/vt/pull/1144 to optimize cmake; tests for cmake running multiple times? @nlslatt: working on #1137 (memory growth in ElementStats); #1141 for fixing temporary IDs for subphases; @cz4rs : #1155 merged; footprinting should be ready for merging #1013 @JacobDomagala : working on #1052; message-based are working @jstrzebonski : encoding member bits in handler field is merged #872 (https://github.com/DARMA-tasking/vt/issues/941); working on #973 @bradybray : working on interface for MemoryPool/Manager #871

lifflander commented 3 years ago
Descriptor Information
Date 11/17/2020 @ 11am
Attendees @lifflander, @nmm0, @PhilMiller, @cz4rs, @JacobDomagala, @jstrzebonski, @bradybray, @nlslatt, @fnrizzi
Description Weekly Meeting

Agenda:

Minutes

@cz4rs : corrections for footprinting are done, ready for merging #1013 @JacobDomagala : working on #1132, works for multiple parameters; started working on #883 @jstrzebonski : finished https://github.com/DARMA-tasking/checkpoint/pull/155; working on #973 @PhilMiller : focused on EMPIRE work this week; will work on reviews @nmm0 : focused on NimbleSM @nlslatt: working on #1127; found a new bug due to missing data @bradybray : working on interface for MemoryPool/Manager #871

lifflander commented 3 years ago
Descriptor Information
Date 11/24/2020 @ 11am
Attendees @lifflander, @nmm0, @PhilMiller, @cz4rs, @JacobDomagala, @jstrzebonski, @nlslatt, @fnrizzi
Description Weekly Meeting

Agenda:

Minutes

@cz4rs : working in checkpoint on https://github.com/DARMA-tasking/checkpoint/issues/156 & https://github.com/DARMA-tasking/checkpoint/issues/161 ; VT https://github.com/DARMA-tasking/vt/pull/1013 for footprinting is nearly ready to merge. @JacobDomagala : finished #1132, starting on trace loading for #900 @jstrzebonski : working on #973 (sendmsg implemented, broadcast almost done) @PhilMiller : helping @cz4rs with footprinting in checkpoint @nmm0 : focused mostly on NimbleSM improvements @nlslatt: looking into #868 (temp vs. perm IDs) to gradually switch over

lifflander commented 3 years ago
Descriptor Information
Date 12/01/2020 @ 11am
Attendees @lifflander, @nmm0, @PhilMiller, @cz4rs, @JacobDomagala, @jstrzebonski, @fnrizzi, @bradybray
Description Weekly Meeting

Agenda:

Minutes

@cz4rs : merged https://github.com/DARMA-tasking/checkpoint/pull/166 ; fixed #1163 ; working on https://github.com/DARMA-tasking/checkpoint/issues/161 @JacobDomagala : working on local invoke #1132---ready to be merged; starting on #1171; worked on traces #903 @jstrzebonski : working on default proxy #1146, left to use in examples and unit tests @PhilMiller : isolated the build failure on Intel 19 issue #728 (reduced test case that exposes compiler bug) @nmm0 : working on getting NimbleSM; trying to identify bug @bradybray : working on finishing #871

lifflander commented 3 years ago
Descriptor Information
Date 12/08/2020 @ 11am
Attendees @lifflander, @nmm0, @PhilMiller, @JacobDomagala, @jstrzebonski, @bradybray
Description Weekly Meeting

Agenda:

Minutes

@cz4rs : on vacation @JacobDomagala : local invoke #1132 read to review; working on MPI traces #903 @jstrzebonski : working on default proxy #1146, left to use in examples and unit tests; working on template repository #1177 @PhilMiller : finishing fix for build failure on Intel 19 issue #728 @nmm0 : working on identifying bug in NimbleSM

lifflander commented 3 years ago
Descriptor Information
Date 12/15/2020 @ 11am
Attendees @lifflander, @nmm0, @PhilMiller, @JacobDomagala, @jstrzebonski, @cz4rs, @bradybray
Description Weekly Meeting

Agenda:

Minutes

@bradybray: working on #871 @cz4rs : working on #1175; working on checkpoint construction: https://github.com/DARMA-tasking/checkpoint/issues/161 @JacobDomagala : local invoke #1132 ready for review; working on MPI traces #903, #1182 @jstrzebonski : default proxy #1146, #973 merged; #1177 template repository is ready; envelope #842 should be closed @PhilMiller : working on benchmarking for LB paper @nmm0 : working on identifying bug in NimbleSM; nothing to report

lifflander commented 3 years ago
Descriptor Information
Date 12/22/2020 @ 11am
Attendees @lifflander, @PhilMiller, @JacobDomagala, @jstrzebonski, @cz4rs
Description Weekly Meeting

Agenda:

Minutes

@bradybray: working on #871 @cz4rs : working on release for #1187 @JacobDomagala : working on trace-only mode #903, #1182 @jstrzebonski : template repository is done, added workflows to serialization sanitizer @PhilMiller : working on getting build resolved

Spack package?

lifflander commented 3 years ago
Descriptor Information
Date 1/05/2020 @ 11am
Attendees @lifflander, @PhilMiller, @JacobDomagala, @jstrzebonski, @cz4rs, @bradybray, @ppebay
Description Weekly Meeting

Agenda:

Minutes

@bradybray: working on #871 @cz4rs : working on release: #1199 (args) and release for 10.4 @JacobDomagala : finished trace-only mode #903, #1182; working on #883 @jstrzebonski : lots of progress on exporting warnings/errors #1184 @PhilMiller : working on getting build resolved

lifflander commented 3 years ago
Descriptor Information
Date 1/12/2020 @ 11am
Attendees @lifflander, @PhilMiller, @JacobDomagala, @jstrzebonski, @cz4rs, @ppebay, @nlslatt
Description Weekly Meeting

Agenda:

Minutes

@cz4rs : finished release: #1199 ; working on warning for errors @JacobDomagala : will make some changes trace-only mode #903, working on #883 @jstrzebonski : finished exporting warnings/errors #1184 @PhilMiller : mostly working on EMPIRE stuff this week @nmm0 : going to move space stuff to new repo; working on paper for contact @nlslatt : working on removing temporary IDs

lifflander commented 3 years ago
Descriptor Information
Date 1/19/2020 @ 11am
Attendees @lifflander, @PhilMiller, @JacobDomagala, @jstrzebonski, @cz4rs, @ppebay, @nlslatt
Description Weekly Meeting

Agenda:

Minutes

@cz4rs : working on #1217; will work on checkpoint documentation bug; @JacobDomagala: working on broadcast issue #883, #1225; trace-only mode needs to be tested @jstrzebonski : done with #1284 and #1184; working on dumping stats failure @PhilMiller : mostly working on EMPIRE stuff this week @nmm0 : pushed initial spack package; working on bvh code @nlslatt : finished removing temporary IDs from final test (needs ZoltanLB work); working on stats files (commenting out single line on release); working on LB experiments

lifflander commented 3 years ago
Descriptor Information
Date 1/26/2020 @ 11am
Attendees @lifflander, @JacobDomagala, @jstrzebonski, @cz4rs, @ppebay, @nlslatt
Description Weekly Meeting

Agenda:

Minutes

@cz4rs : working on #1221 to keep last collection element; checkpoint https://github.com/DARMA-tasking/checkpoint/pull/176; OMP lock issue https://github.com/DARMA-tasking/vt/pull/1214 ready to merge; open issue for omp+clang in the container and std::thread and gcc; started working on serialization sanitizer for clang plugin @JacobDomagala: working on broadcast issue #883, #1225; fixes to trace-only @jstrzebonski : make CI stop running tests for compilation fails; working on sample project install test #1230; work on spack next @PhilMiller : out this week @nmm0 : pushed initial spack package; working on bvh code @nlslatt : working on LB experiments and GossipLB

lifflander commented 3 years ago
Descriptor Information
Date 2/2/2020 @ 11am
Attendees @lifflander, @JacobDomagala, @jstrzebonski, @cz4rs, @ppebay, @nlslatt
Description Weekly Meeting

Agenda:

Minutes

@cz4rs : #1173 ready; #1223 merged; working on the sanitizer @JacobDomagala: working on broadcast issue #883, #1225; fixes to trace-only waiting for @nmm0 ; working on new argument for flushing traces @jstrzebonski : working on #1174 @PhilMiller : working on Kokkos::parallel_for @nmm0 : working on paper @nlslatt : subphase alignment and naming

lifflander commented 3 years ago
Descriptor Information
Date 2/9/2020 @ 11am
Attendees @lifflander, @JacobDomagala, @jstrzebonski, @cz4rs, @ppebay, @nlslatt, @nmm0, @fnrizzi, @PhilMiller
Description Weekly Meeting

Agenda:

Minutes

@cz4rs : #1253 ready to merge; working on sanitizer PR https://github.com/DARMA-tasking/serialization-sanitizer/pull/18 @JacobDomagala: working on trace-only mode #1182; working on #1174 and #1225; @jstrzebonski : closed #1174; #1168 fixed; working on #1239 @PhilMiller : working on CUDA streams and improvements to Kokkos @nmm0 : working on NimbleSM paper; and reviewing trace-only @nlslatt : working on debugging GossipLB; subphase string names resolution protocol;

lifflander commented 3 years ago
Descriptor Information
Date 2/16/2020 @ 11am
Attendees @lifflander, @JacobDomagala, @jstrzebonski, @cz4rs, @ppebay, @nlslatt, @PhilMiller, @nmm0
Description Weekly Meeting

Agenda:

Minutes

@cz4rs : working mainly on the serialization sanitizer; working on #1258; adding skip to checkpoint https://github.com/DARMA-tasking/checkpoint/pull/179 @JacobDomagala: working on #1257; broadcast #1225; #1261 is merged; working on #1216-- should have PR soon @jstrzebonski : working on #1239 (updating comments w/warnings and errors) @PhilMiller : working on Kokkos::parallel_for changes for CUDA streams; ULTs issues @nmm0 : nothing to report @nlslatt : working on GossipLB; abstract driver test program

lifflander commented 3 years ago
Descriptor Information
Date 2/23/2020 @ 11am
Attendees @lifflander, @JacobDomagala, @jstrzebonski, @cz4rs, @ppebay, @nlslatt, @PhilMiller, @nmm0
Description Weekly Meeting

Agenda:

Minutes

@cz4rs : Issue #178 @ checkpoint: added serialization skip for sanitizer, PR #179 was merged, issue closed Issue #1258 @ vt: fields missed during serialization / footprinting: updated and marked skipped variables explicitly Issue #12 @ serialization-sanitizer: converted runtime sanitizer into a clang plugin, working on integration with checkpoint and vt @JacobDomagala:

883 Looks like it's ready to be merged

1215 Close to being done, will need to do some testing of new docker image

1216 Also is almost done (will have to move the action repository to DARMA-tasking, and also the graph requires few tweaks)

1266 PR is open for review

Both #1272 and #1260 are on my TODO list @jstrzebonski : vt #1239 (Explore updating or deleting old comments with errors/warnings) is finally done. No more warnings-errors spam in PR thread. One remark - if the compilation itself was successful, but other pipeline's steps failed (like tests), the comment saying that build for given commit was successful will still be posted. If that's somehow confusing, the message could be change to something better - I'm open for propositions if that's the case. I got back to spack-package #1 (Test current package and expand parameters to building VT/checkpoint) @PhilMiller : working on Kokkos::parallel_for changes for CUDA streams; ULTs issues @nmm0 : done with paper, resuming work with bvh optimization and load balancing, working on NimbleSM paper @nlslatt : Posted PR #1273: fixes a failed assert when empire runs with tracing enabled. Posted PR #1274: makes building the release branch easy like develop. Posted PR #1277: makes stats files generated by the release branch usable (in my case, for testing load balancers). Issue #1279: I'm removing bugs from GossipLB, as discussed last week. Issue #1265: I'm polishing stats-reading driver that Jonathan created. LBAF issue #72: I found a bug in LBAF’s CMF that led to new ideas about getting better imbalance. Bus error: I have not yet identified the cause of bus error during LB seen on Stria for empire+vt runs using multiple LBs.

lifflander commented 3 years ago
Descriptor Information
Date 3/2/2020 @ 11am
Attendees @lifflander, @JacobDomagala, @jstrzebonski, @cz4rs, @ppebay, @nlslatt, @PhilMiller, @nmm0
Description Weekly Meeting

Agenda:

Minutes

@cz4rs : My update: vt #1258: add fields omitted during serialization - fixed vt #1229: enable debug prints and assertions in release builds - prints and assertions are now on by default, use vt_production_build_enabled flag to disable them at compile time; I am fixing minor issues and waiting for more feedback; TestTermDepSendChain.test_term_dep_send_chain_merge is failing regularly (only on macosx build) - investigating serialization-sanitizer #12: convert sanitizer into a clang plugin - not much time spent here, still working on integration with vt and CI; it seems that it's not possible to edit and compile files in a single run (sticking with PartialSpecializationGenerator for now) Misc:

1229 removes a couple of obsolete scripts, vt-auto-build repository can be removed (archived?) now (admin rights required)

1229 is supposed to both change the mechanism and categorize existing prints (terse, normal, verbose levels); this can be done in separate PRs if debug prints in release are needed faster

@JacobDomagala: Pending PRs

1215 (Using prebuilt docker image for pushdockerimage.yml) PR waiting for reviews

1216 (graph build times of pushdockerimage.yml) Is open for reviews - mostly waiting for graph-repo PR to be merged

1293 - remove dead code in vrt-collection

WIP:

1272 - Use runSchedulerWhile instead of runScheduler when applicable -> I'm close to creating PR for this one

1260 - Not much to update on this one, hopefully will progress on it this week

Done

1266 - Use checkpoint reconstruct for migrated vrt-collection elems

1286 - Fix issues after merging #883

@jstrzebonski : spack-package #1 (Test current package and expand parameters to building VT/checkpoint) - is done. I reworked the package a bit: vt's configuration parameters can be used for build packages names are unified simple sanity checks are added vt #1290 (Add Azure pipeline to test vt spack package) - That's almost done. I'm currently finishing testing build script locally on my machine. @PhilMiller : Little bits of code review, otherwise focused on Kokkos/Cuda/EMPIRE stuff @nmm0 : My update: done with paper, resuming work with bvh optimization and load balancing, working on NimbleSM paper @nlslatt : My update: I made the LB testing/tuning driver a lot more powerful. It only reads release stats files right now but I will add support for develop, probably after we change the stats format soon. I also added asynchronous informs to GossipLB and made an option to switch between that and the synchronous informs. I'm hoping to play with these two branches together tomorrow. I'm getting close to posting PRs.

lifflander commented 3 years ago
Descriptor Information
Date 3/9/2020 @ 11am
Attendees @lifflander, @JacobDomagala, @jstrzebonski, @cz4rs, @ppebay, @nlslatt, @PhilMiller, @nmm0
Description Weekly Meeting

Agenda:

Minutes

@cz4rs : My update: checkpoint #180 - memory explosion for empty std::unordered_map (fixed) checkpoint #182 - std::unordered_map serializer code still abusing memory and causing icc ICEs (fixed) vt #1304 - ElementStats unordered_map capacity keeps growing (fixed with checkpoint #182) vt #1302 - Intel compiler ICE - initial verification and reduction (currently handled by Phil) vt #1229 - enabling debug prints and assertions in release builds - ready for review (second round) vt #1307 - fix warning on release branch - ready to merge (already fixed on develop) Misc: release 1.0.0-beta.10.4.1 - 3 PRs merged, 2 open release 1.0.1 - 5 issues completed, 2 open checkpoint #184 created to keep track of preserving std::unordered_map's capacity @JacobDomagala: My update: Pending PRs

1215 (Using prebuilt docker image for pushdockerimage.yml) PR waiting for reviews

1216 (graph build times of pushdockerimage.yml) Is open for reviews - mostly waiting for graph-repo PR to be merged

1293 - remove dead code in vrt-collection

WIP:

1272 - Use runSchedulerWhile instead of runScheduler when applicable -> I'm close to creating PR for this one

1260 - Not much to update on this one, hopefully will progress on it this week

Done

1266 - Use checkpoint reconstruct for migrated vrt-collection elems

1286 - Fix issues after merging #883

@jstrzebonski : I'm working on two things at the moment: vt #1290 (Add Azure pipeline to test vt spack package) - I prepared Azure pipeline based on azure-gcc-10-ubuntu-openmpi but vt cannot be compiled on it because of the resources limit: /usr/bin/ld: final link failed: No space left on device Here's a link to logs: https://dev.azure.com/DARMA-tasking/DARMA/_build/results?buildId=8850&view=logs&j=3[…]-5a92-293e-d53cefc8c4b3&t=28db5144-7e5d-5c90-2820-8676d630d9d2 vt #1291 (Build warnings/errors forwarded don't include test status) - After short experiments this seems to be easy to do. I don't have working solution yet, but I'm close to make it. @PhilMiller : Merged a change to vt’s cmake in EMPIRE https://cee-gitlab.sandia.gov/EM-Plasma/EMPIRE/-/merge_requests/1482 - this should be incorporated in the vt repository as well, especially before any new stable release Reduced Intel ICE seen with rehash usage and reported I wrote up and shared notes on a meaningful object ordering for refinement-based LB strategies like GossipLB On that note, it may be worth considering alternatives that accept larger candidate objects earlier if sufficiently underloaded processes are available to send them to - the marginal object threshold is the smallest object size that will necessarily be migrated to achieve non-overload, but there’s nothing actually saying we can’t go bigger instead, just the heuristic that it will be harder to find spots for bigger objects Work continues on EMPIRE adaptation for CUDA streams @nmm0 : Working with Jonathan on debugging LB migration crash Adding a compile-time toggle in my code to switch on and off insertable collections to narrow down bug @nlslatt : 'm still focused on improving GossipLB (issue 1279). Added options for considering objects to migrate in arbitrary (ordering=0), ID (ordering=1), or marginal (ordering=2) order Added deterministic mode (LB arg deterministic=1), which is not compatible with arbitrary ordering Added different CMF options: original, which excludes recently overloaded processors (cmf=0), keep recently overloaded and normalize by greater of max and avg load (cmf=1), and keep recently overloaded and normalize by own load (cmf=2) Found a bug in GossipLB that made it difficult to iteratively refine the load balance Noticed that post-migration object load statistics race with object migration Started running some 400 rank experiments on Stria to exercise new options GossipLB was only 1.5% slower on bdot vt-capable work (which includes extra reductions for debugging) than HierarchicalLB on 400 rank run

lifflander commented 3 years ago
Descriptor Information
Date 3/16/2020 @ 11am
Attendees @lifflander, @JacobDomagala, @jstrzebonski, @cz4rs, @ppebay, @nlslatt, @PhilMiller, @nmm0
Description Weekly Meeting

Agenda:

Minutes

@cz4rs : vt #1229 - debug prints and assertions in release builds - PR #1288 merged, issue closed vt #1317 - memory leak (asan) - attempted bisect, requires discussion (how should we handle this?) vt #1006 - GroupCollective bug - created a testcase with a simple delay to reproduce the bug, could something akin to CollectionManager::bufferOpOrExecute be used to fix this? @JacobDomagala: Pending PRs: checkpoint - 185 A quick PR to checkpoint with build badges added to README VT - 1272 Remove vt::runScheduler and use runSchedulerWhile instead (where applicable) VT - 1328 Add helper macros to skip running tests based on current number of nodes used. Change CMake to also run tests on single node, and disable the tests that shouldn't run on fewer than 2 ranks Almost ready to review: VT - 1297 Increase the number of nodes for our pipelines (After VT - 1328 gets merged, this should be ready for review) WIP: VT - 1260 Sending a message to a functor with the ActiveMessenger requires the message type. Nothing much for this one, I'm looking whether it's possible to have it working without going back to using MsgPtr instead of MsgPtrThief @jstrzebonski : I'm working on: vt #1320 - Put a direct link to Azure builds in PR comment Done: vt #1318 - Silence nvcc builds warning vt #1291 - Build warnings/errors forwarded don't include test status vt #1323 - Don't report test status if compilation failed @PhilMiller : I’m working on wiring EMPIRE up to the new AsyncOpCUDA and ULT infrastructure Porting Charm++ DistributedLB (the inspiration of our GossipLB) @nmm0 : Running LB runs with BVH -- GreedyLB run failed to complete Currently re-running two runs with HierarchicalLB that scales better @nlslatt : I've been comparing GossipLB with full information to GossipLB with partial information, finding that partial information still does very well. I also found and fixed a bug in GossipLB that was causing a failed assertion or segfault when only one underloaded rank was known. I'm waiting on a Trilinos PR merge and rebuild on Stria before running ZoltanLB experiments for comparison.

lifflander commented 3 years ago
Descriptor Information
Date 3/23/2020 @ 11am
Attendees @lifflander, @JacobDomagala, @jstrzebonski, @cz4rs, @ppebay, @nlslatt, @PhilMiller, @nmm0
Description Weekly Meeting

Agenda:

Minutes

@cz4rs : My update: vt #1317 - memory leak exposed by asan - PR #1332 merged, issue closed vt #1006 - GroupCollective setup bug - work in progress vt #1331 - increased compilation time - time measured for various configurations,some experiments with fmt Misc: enabling address sanitizer in openmpi build shows multiple leaks - https://github.com/DARMA-tasking/vt/files/6166853/openmpi-asan-build.log gcc-7 build is failing with No space left on device error from time to time @JacobDomagala: My update: WIP [main prio} https://github.com/DARMA-tasking/vt/issues/1333 - Use time output from running cmake build instead of GitHub API. Hopefully this will produce more accurate build time statistics WIP [in queue] https://github.com/DARMA-tasking/vt/pull/1306 - oversubscribe tests on CI. This one is waiting for PR #1329 https://github.com/DARMA-tasking/vt/pull/1330 - fix functor in ActiveMessenger. Postponed for a while Pending PR: https://github.com/DARMA-tasking/vt/pull/1329 - Running tests also on single node and add helper macros to skip running tests based on current number of nodes Merged: https://github.com/DARMA-tasking/vt/issues/1272 Update Scheduler usage in our codebase @jstrzebonski : II'm working on: checkpoint #174 - Add typeid name/type registry to pack/unpack to confirm class correctness Done: vt #1290 - Add Azure pipeline to test vt spack package vt #1320 - Put a direct link to Azure builds in PR comment @PhilMiller : ’m fairly hour-limited this month, and that time was front-loaded before the limit went into effect. So, I’ll be scarce this coming week. My work: vt#413 Reviewing change to provide support for suspendable threaded execution of handlers, including refactoring to improve context access Discussions on load balancing improvements @nmm0 : Update: Mostly contact work, ran several runs with load balancer current results show on some problems very large performance improvements (~6x) with load balancer TODOS: run tests without insertable collections and see if there is any difference. Finish work on patch caching on the application side @nlslatt : My update: Found and fixed a bug that allowed receiving messages about underloaded processors too early within GossipLB, but fix interacts badly with the way empire calls the load balancer. Added an option to redefine the terms underloaded and overloaded based on long-pole object sizes (that exceed the processor-average load), but haven’t been able to experiment with it yet due to the above issue. Started experimenting with ZoltanLB.

lifflander commented 3 years ago
Descriptor Information
Date 3/30/2020 @ 11am
Attendees @lifflander, @JacobDomagala, @jstrzebonski, @cz4rs, @ppebay, @nlslatt, @PhilMiller, @nmm0
Description Weekly Meeting

Agenda:

Minutes

@cz4rs : Limited hours here as well. Update: vt #1331 - failing macosx-clang build - PR #1343 ready for review / merge vt #1006 - back to GroupCollective setup bug Misc: I took a look at some of the oldest open tickets in vt, possible duplicates linked in their respective modern counterparts. @JacobDomagala: My update: Pending-PR https://github.com/DARMA-tasking/vt/pull/1329 - Running tests also on single node. Here i spent some time trying to fix tests after recent changes to develop. I added the "fix" which changes the logic slightly, so I would like someone to take a look at it https://github.com/DARMA-tasking/vt/issues/1333 - Use time output from running cmake build instead of GitHub API. Did some testing, it looks promising so I'm waiting for https://github.com/DARMA-tasking/graph-build-times/pull/6 to be merged and then this one should also be done WIP [in queue] https://github.com/DARMA-tasking/vt/pull/1306 - oversubscribe tests on CI. This one is waiting for PR #1329 https://github.com/DARMA-tasking/vt/pull/1330 - fix functor in ActiveMessenger. Postponed for a while @jstrzebonski : I'm working on: checkpoint #174 - Add typeid name/type registry to pack/unpack to confirm class correctness @PhilMiller : Update: I’ve been hour-limited this past week. My work has mostly been in discussions and reviews. I’m going to be on a short vacation Friday through next Tuesday, so I don’t expect to be present for next week’s Tuesday meetings @nmm0 : Update: Working on trying to track down BVH bug with higher overdecomposition factors Worked on tracking down vt build system bug where MPI wrappers aren't generated -- vt_mpi_guards does not seem to be defined the first time cmake is run from a clang context @nlslatt : My update:

1279: got gossiplb running correctly and submitted prs #1348 (for develop) and #1349 (for the release branch)

thought about how to isolate long pole objects when load balancing with GossipLB and decided to save this for after the PR

1338: started implementing new trace spec options that can be programmatically controlled

trying to debug a termination hang that affects runs with ZoltanLB

lifflander commented 3 years ago
Descriptor Information
Date 4/06/2020 @ 11am
Attendees @lifflander, @JacobDomagala, @jstrzebonski, @cz4rs, @ppebay, @nlslatt, @PhilMiller, @nmm0
Description Weekly Meeting

Agenda:

Minutes

@cz4rs : Update: checkpoint #175 - make detector library required - fixed vt #1355 - make detector library required - fixed vt #1006 - hang detected while trying to fix race in GroupCollective @JacobDomagala: My update:

1306 running CI tests&examples on 4 nodes - after #1329 got merged, I've updated this PR and it's waiting for reviews

1352 and #1333 - I've changed the graph-build-times github action (#8) to now build vt and generate some build data (including the updated graph and badge). I've been testing it on PR #1335

1247 create clang9/clang10 workflows - I will need someone with admin rights to create new azure pipelines for vt

@jstrzebonski : I took almost whole week off, so not much changed, I'm still working on checkpoint #174 - Add typeid name/type registry to pack/unpack to confirm class correctness. Also, there was a change in GitHub tokens format, so I updated token for comment-on-pr action, as GitHub advised to.
@PhilMiller : Got home just now, so I’ll be signed into today’s meeting. Probably mostly be listening, since I’ve done nil work since last week @nmm0 : update:

1350: moving cmake options before pmpi generation

debugging memory corruption issue @nlslatt : About to merge improved ease of vt use within empire About to merge vt release 1.0.1 into empire

lifflander commented 3 years ago
Descriptor Information
Date 4/13/2020 @ 10am
Attendees @lifflander, @JacobDomagala, @jstrzebonski, @cz4rs, @ppebay, @nlslatt, @PhilMiller, @nmm0
Description Weekly Meeting

Agenda:

Minutes

@cz4rs : Update: vt #1361 - version 1.0.1 released vt #1331 - increased compilation time - posted PR #1374 (use fmt as a static library), still working on fixing vt-trace vt #1006 - GroupCollective setup bug - adding synchronization after registering the continuations introduced hang in tutorial (https://github.com/DARMA-tasking/vt/pull/1325#issuecomment-816752072) Misc: Some non-essential issues were pushed to 1.0.2 in order to streamline 1.0.1 release. Sketching some ideas for measuring runtime performance (#1294) to compliment compilation time work. @JacobDomagala: My update: 1306 - oversubscribe tests on CI -> waiting for Jonathan's approval Merged 1365 - Add clang 9 and 10 builds for our CI 8 and 1335 - updated the github action to also generate bar graph for most expensive template instantiation (the length of some templated function names didn't make it easy). @jstrzebonski : I'm working on DARMA-tasking/checkpoint#174 - I put a PR draft with some rough idea on how to tackle the problem and added some unit test to see if that actually works. @PhilMiller : I’ve been working on LB-related stuff in general, including review of #1349. I also helped with debugging some checkpoint/restart issues. I’m currently working on the LB paper, while waiting for various changes to land in EMPIRE and its dependencies @nmm0 : fixed gpg issue 1360 ready for review bvh problem seems to be incrementing references -- may be related to load balancer @nlslatt : My update: Merged PR #1348 with GossipLB changes Posted PR #1370 adding Phil’s SmallestObjects ordering for GossipLB - approved Posted PR #1373 skipping phase 0 even with interval of 1 - approved Posted PR #1371 validating LB strategy name instead of defaulting to NoLB when no match to a known LB - approved but Cezary recommended adding a death test before merging

lifflander commented 3 years ago
Descriptor Information
Date 4/20/2020 @ 10am
Attendees @lifflander, @JacobDomagala, @jstrzebonski, @cz4rs, @ppebay, @nlslatt, @PhilMiller, @nmm0
Description Weekly Meeting

Agenda:

Minutes

@cz4rs : Update: vt #1006 - GroupCollective setup bug - merged vt #1331 - PR #1374 (use fmt as a static library) - ready for review vt #1383 - Add variants of runInEpoch with a label - merged checkpoint #192 - add clang 9/10 workflows - in progress (could be extended with adding ASan and UBSan if needed) Misc: vt #1366 - analysis of ASSERT_DEATH unexpected failures - in this case GoogleTest does not recognize an exit correctly (vtAbort uses MPI_Abort and std::_Exit) https://github.com/DARMA-tasking/vt/blob/381f376defa8f202889d3c49290ddd00612875cd/src/vt/runtime/runtime.cc#L517-L529 random test failures spotted: vt:/TestLoadBalancer.test_load_balancer_keep_last_elm/_proc_2 @JacobDomagala: My update: 1306 - Oversubscribe CI -> Merged 1391 - Remove unnecessary vt/transport.h includes from tests -> Merged 8 and 1335/1333 - Updated the github action to generate wiki page https://github.com/DARMA-tasking/vt/wiki/Build-Stats with all the data @jstrzebonski : My update: still working on DARMA-tasking/checkpoint#174 - waiting for feedback on the printing whole serialization path in case of error started to work on DARMA-tasking/vt#1357 @PhilMiller : Nothing of my own to report - just been doing design/code review on vt stuff lately @nmm0 : Nothing much DARMA-related this week. Keita and I discovered an issue with the cmake where if checkpoint/detector are in the libs folder and checkpoint_DIR and detector_DIR are specified, cmake fails @nlslatt : My update: Merged PR#1371 – validation of LB strategy name Merged PR#1373 – skip LB on phase 0 when interval is 1 Merged PR#1380 – speed up GossipLB test so it doesn’t timeout Merged PR#1388 – changing the size of the GossipLB fanout type Collected data for paper on load balancing with vt Submitted request for collaborative storage space at Sandia for run data Started reading and commenting on LB paper draft

lifflander commented 3 years ago
Descriptor Information
Date 4/27/2020 @ 10am
Attendees @lifflander, @JacobDomagala, @jstrzebonski, @cz4rs, @ppebay, @nlslatt, @PhilMiller, @nmm0, @MikolajZuzek
Description Weekly Meeting

Agenda:

Minutes

@cz4rs : Update: vt #1331 - Optimize build times - PR#1374 merged vt #1385 - CUDA warning on release branch - merged vt #1401 - documentation fails to build - ready for review vt #1151 - Add partial doc checkpoint - ready for review, depends on #1401 checkpoint #192 - add more clang workflows - ready for review Misc: created vt #1395 - test_load_balancer_keep_last_elm fails occasionally, detailed log posted @JacobDomagala: My update: 8 and 1335/1333 - Generate Build Stats wiki page https://github.com/DARMA-tasking/vt/wiki/Build-Stats is finally merged 1330 - Fix active messaging functor inference using the FunctorExtractor - Open for review (this PR also includes fix for nvcc warning that started to appear recently) @jstrzebonski : My update:

lifflander commented 3 years ago
Descriptor Information
Date 5/4/2020 @ 10am
Attendees @lifflander, @JacobDomagala, @jstrzebonski, @cz4rs, @ppebay, @nlslatt, @PhilMiller, @nmm0, @MikolajZuzek
Description Weekly Meeting

Agenda:

Minutes

@cz4rs : Update: vt #1401 - documentation fixed, PR #1404 merged vt #1151 - documentation improvements, PR #1152 approved, waiting for CI checkpoint #192 - add more clang workflows - PR #193 ready for review vt #1425 created to work on 1.0.2 release vt #1406 - update required CMake version - mostly done, requires confirmation to avoid conflicts with EMPIRE vt #970 - improve reduce interface - work in progress @JacobDomagala: My update: VT - 1330 - Updated the PR with separate version for Active send/broadcast with Functor handler, with explicit message type template param and also added the examples that use both (deduced message type and explicit) VT - 1424 - Added CMake option, that makes vtAbort throw an exception instead of calling MPI_Abort, enabled it by default for all our workflows VT - 1427 - Created new issue for the tests that started to timeout on Azure @jstrzebonski : MMy update:

@nmm0 : My update, not a lot this week, mostly have been running tests and experiments: RandomLB doesn't seem to cause issues with bvh GossipLB has element inversion -- I don't think this is a bug with GossipLB. Investigating further @nlslatt : My update: Submitted PR #1421: exclude ineligible recipient nodes for GossipLB transfers Submitted PRs #1417, #1418, #1419, #1420 for merging with 1.0.2 release branch Submitted PR #1422: improve LB args and their defaults for GossipLB Added non-interactive gdb to wiki debugging page Worked on LB paper Merged vt usage details into EMPIRE user guide Trying to track down memory corruption that’s only surfacing in EMPIRE with vt enabled

lifflander commented 3 years ago
Descriptor Information
Date 5/11/2020 @ 10am
Attendees @lifflander, @JacobDomagala, @jstrzebonski, @cz4rs, @ppebay, @nlslatt, @PhilMiller, @nmm0, @MikolajZuzek
Description Weekly Meeting

Agenda:

Minutes

@cz4rs : Update: vt #1151 - additional checkpoint documentation in vt, PR #1152 merged vt #1406 - bump CMake min. version to 3.17, PR #1411 merged vt #1425 - prepare 1.0.2 release - in progress (1.0.2-proposed-update branch) Misc: checkpoint #7 and #172 closed as obsolete, #60 could probably be closed as well (refers to nvcc 9.x) @JacobDomagala: My update: VT - 1330: Fix FunctorExtractor issue -> Merged VT - 1434: Fix issues with Azure tests -> Currently the tests that were failing are skipped, but it seems that they no longer fail (perhaps it could've been some kind of reasource issue on Azure before?) VT - 1424: Throw an exception on vtAbort -> Still WIP, I've had some issues with proper termination of vt after one of the nodes aborts VT - 1294: Create test harness for perf tests -> WIP @jstrzebonski : My update Done:

lifflander commented 3 years ago
Descriptor Information
Date 5/18/2020 @ 10am
Attendees @lifflander, @JacobDomagala, @jstrzebonski, @cz4rs, @ppebay, @nlslatt, @PhilMiller, @nmm0, @MikolajZuzek
Description Weekly Meeting

Agenda:

Minutes

@cz4rs : Update: vt #1425 - prepare 1.0.2 release - in progress, PR #1441 posted vt #1435 - Print time units and adjust accordingly for LB times - in progress vt #1439 - LB: do not convert time into milliseconds - split from #1435 Misc: created a draft for 1.0.2 release (https://github.com/DARMA-tasking/vt/releases) three issues remain open (https://github.com/DARMA-tasking/vt/issues?q=is%3Aissue+label%3A1.0.2+)

1128 can probably be closed (irrelevant on develop, PR #1418 merged on release branch)

two weeks of vacation incoming (starting on May 24th, I will be back on June 8th) in queue: 1.0.2 release -> vt #1435 -> vt #970 -> vt #1439 -> serialization-sanitizer @JacobDomagala: My update: VT - 1434 - Re-enable termination tests (that were commented due to timeouts) and add CMake lists for test files that will be excluded from test suite (this includes the extensive sequencer test) VT - 1424 - Throw an exception on vtAbort for tests -> I had to disable the mpi access failure test, due to the way the scheduler loop is written (traces are not working correctly when exception is thrown from within runWorkUnit inside scheduler) VT - 1438 - test harness for perf tests -> I'm working on creating the TestHarness for performance tests, which would contain some helper functions/macros to track time execution and memory consumption. I've also created new branch for build-stats github action to also run the tests there VT - 1440 - Use labels for collective (and rooted?) epochs -> PR created @jstrzebonski : My update Done:

nlslatt commented 3 years ago
Descriptor Information
Date 05/25/2021
Attendees @nmm0 @JacobDomagala @jstrzebonski @MikolajZuzek @PhilMiller @bathmatt @fnrizzi @ppebay @nlslatt
Description Weekly Meeting

Agenda:

Minutes:

@lifflander is on an interview panel today.

@nmm0:

• We implemented a fix for the epoch bug problem (https://github.com/DARMA-tasking/vt/pull/1448) – greater than eager threshold • This fixed an issue with some bvh test cases. Also fixed the bug with LB! So that is really good news • Discussed in the meeting that Nic’s not sure if all send cases are being tested yet, so will put in another issue for increased test coverage (#1449)

@JacobDomagala:

• vt - 1424 - throw on vtAbort (and enable this option on our pipelines) -> Merged • vt - 1448 - bug with serialized messaged on self send -> Added test that reproduces the issue • vt - 1440 - label epochs created by vt -> Should be ready to merge i think • vt - 1438 - add perf tests to CI -> Added initial test harness with helpful macros (somewhat similar to how gtest works) and helper timers. Currently I'm working on generating meaningful output (both ASCII and file)

@jstrzebonski:

• Done: DARMA-tasking/checkpoint#203 - I'm still waiting for a review here. • In progress: DARMA-tasking/checkpoint#209 - recursive validation of memory usage during serialization. • Discussed in the meeting that the nvcc pipeline is failing (memory limitations?). Maybe can split up the test that fails to compile

@MikolajZuzek:

• vt #1344: capture message size in MsgSharedPtr: PR #1426 ready for review. • Discussed in the meeting and decided not to put the serialized size into a new message pointer because it’s a fundamentally different quantity. Will either do the sizing within makeTraceCreationSend or just pass it as before.

@nlslatt:

• Still need reviews on PR #1422 • We submitted the LB paper (copy on the Slack load balancing channel) • Also started looking into how I might put the core of my LB testing driver into vt src to serve dual purposes, for automated testing of LBs and for running manually to tune an existing LB for a given problem type or to develop a new LB

@PhilMiller:

• My focus has been EMPIRE stuff, and I was away the latter part of last week. I reviewed the paper draft, and some PRs, but that’s about it

lifflander commented 3 years ago
Descriptor Information
Date 06/01/2021
Attendees @nmm0 @JacobDomagala @jstrzebonski @MikolajZuzek @PhilMiller @fnrizzi @ppebay @nlslatt
Description Weekly Meeting

Agenda:

Minutes:

@lifflander:

@nmm0:

More debugging. Jobs finally ran with RotateLB and RandomLB -- they have correct results. So this is kinda worrying because it may indicate another LB bug

@JacobDomagala:

My update:

@jstrzebonski:

Nothing really changed from last time: Done:

Also I've got an unexpected situation I need to take care of, so I might be late (or even don't make it) for a weekly meeting, but I'll do my best to be there anyway, sorry for inconvenience.

@MikolajZuzek:

@nlslatt:

@PhilMiller:

I worked on serialization of device-space data structures (checkpoint#197), including opening an issue for missing execution-space overloads (kokkos#4057 / PR kokkos#4059) and for an incidental test failure discovery (kokkos#4058)

lifflander commented 3 years ago
Descriptor Information
Date 06/08/2021
Attendees @nmm0 @JacobDomagala @jstrzebonski @MikolajZuzek @PhilMiller @fnrizzi @ppebay @nlslatt
Description Weekly Meeting

Agenda:

Minutes:

@lifflander:

Discussed new release and structure for the releases

@cz4rs:

@nmm0:

Update:

@JacobDomagala:

My update:

@jstrzebonski:

My update

@MikolajZuzek:

My update:

@nlslatt:

My update:

@PhilMiller:

lifflander commented 3 years ago
Descriptor Information
Date 06/15/2021
Attendees @nmm0 @JacobDomagala @jstrzebonski @MikolajZuzek @PhilMiller @fnrizzi @ppebay @nlslatt
Description Weekly Meeting

Agenda:

Minutes:

@lifflander:

@cz4rs: Update:

@nmm0:

@JacobDomagala:

@jstrzebonski:

@MikolajZuzek:

@nlslatt:

@PhilMiller:

lifflander commented 3 years ago
Descriptor Information
Date 06/22/2021
Attendees @nmm0 @JacobDomagala @jstrzebonski @cz4rs @PhilMiller @fnrizzi @ppebay @nlslatt
Description Weekly Meeting

Agenda:

Minutes:

@lifflander:

@cz4rs:

@nmm0:

@JacobDomagala:

@jstrzebonski:

@nlslatt:

@PhilMiller:

lifflander commented 3 years ago
Descriptor Information
Date 06/22/2021
Attendees @nmm0 @JacobDomagala @jstrzebonski @cz4rs @PhilMiller @fnrizzi @ppebay @nlslatt
Description Weekly Meeting

Agenda:

Minutes:

@lifflander:

@cz4rs:

@nmm0:

@JacobDomagala:

@jstrzebonski:

@nlslatt:

@PhilMiller:

lifflander commented 3 years ago
Descriptor Information
Date 06/29/2021
Attendees @nmm0 @JacobDomagala @cz4rs @fnrizzi @ppebay @nlslatt
Description Weekly Meeting

Agenda:

Minutes:

@lifflander:

@cz4rs: Update:

@nmm0:

@JacobDomagala:

My update: Open PRs:

@jstrzebonski:

@nlslatt:

@PhilMiller:

lifflander commented 3 years ago
Descriptor Information
Date 07/06/2021
Attendees @nmm0 @JacobDomagala @cz4rs @fnrizzi @ppebay @nlslatt
Description Weekly Meeting

Agenda:

Minutes:

@lifflander:

@cz4rs: Update: vt #1202 - Write container build for Intel 21 OneAPI - WIP, slimmed down container ready for review checkpoint #215 - remove version number from headers, bump version to 1.1.0 - done vt #1487 - create 1.1.1 Beta v1 release candidate - ready for review, see draft release notes Misc: release process described on wiki, improvements are welcome - https://github.com/DARMA-tasking/vt/wiki/Releases budget restrictions - less hours for DARMA in July (I will spend most of my time on Kokkos) I'm still available on a daily basis, just ping me as needed

@nmm0: My update: Finally got a lead on the BVH bug. There is a reduction at the beginning of the iteration that seems to be incomplete after load balancing. I'm investigating the circumstances that it's happening under. Some elements of the reduction have been migrated to the same node but I need to double check whether the error hits the first time they get migrated

@JacobDomagala: My update: Working on: vt - 1449 - Regression tests for send combinations vt - 1445 - Schedule message instead of doing MPI self-send Also, for July I will be spending less hours on VT and more on Trilinos.

@jstrzebonski:

@nlslatt: My update: Mostly on Gemma and EMPIRE work We could publish results from Intel builds on Sandia machines to open CDash, e.g., https://my.cdash.org/index.php?project=CIME

@PhilMiller: