Open deliciouslytyped opened 1 year ago
@deliciouslytyped No, not to this point, it never materialized...
cc @rhc54
Totally fell off my radar - I'm in the process of moving, but I'll add it to my list of things to do and try to get to it soon. Thanks for the reminder!
@rhc54 Any updates? :)
Maybe now is a better time even, with the recent release of PMIx v5.0...
Sigh...my apologies. The move has been drawn out and difficult, death in the family, trying to slow down in retirement, etc. I'll try to put something together over the next month.
Appreciate the patience and friendly reminders 😄
@boegel Would you be interested/willing to setup another EasyBuild meeting for me to present this? Would be easier for me if you could record/post it, and make it more interactive. Are there specific topics/questions people would like me to address?
@rhc54 That would certainly be interesting to have an EasyBuild Tech Talk on this.
Do you mind sending me an email about that? I'm bound to overlook replies in issues, since I have notifications disabled for that due to too much noise...
Hey folks, any news on this? I assume it's gone somewhere in the pile... ;)
Not sure if slides or documentation already exists, but if that do that would be helpful already.
Oh my - yes it did fall off the radar as I began to make a concerted effort to actually (finally) retire. I have been greatly reducing my effort as a result. However, that doesn't change the fact that I promised to do this and have so far failed - which is disturbing.
There is documentation out there, but I can/should provide the talk to supplement it. I probably forgot to send @boegel an email as requested. I'll do that now and try to follow thru.
Thanks for the reminder!
Ping? :)
Sorry for the silence - I exchanged some emails with Kenneth et al shortly after your initial "ping". With planned holidays, we agreed to do this talk sometime in 2nd half of Aug or perhaps early Sept. I don't think we settled on an actual date before folks left for vacation, but it shouldn't take long to resolve once they return.
Plan is:
I can talk about the rather large architectural change in OMPI v5 to focus solely on PMIx (dropping PMI-1 and 2 completely) and switching to use PRRTE as their RTE (and dropping their embedded ORTE). I can also talk about the new features in OMPI as they rely on new PMIx/PRRTE capabilities, and how people are using the scheduler integration to study malleable applications. Finally, can talk a bit about project directions given my "retirement".
Feel free to suggest additional/alternate topics!
Sorry this hasn't materialized - few communication issues, been swamped by other things, and struggling a bit with a health issue. Looks like I'm going to be unable to give a talk anytime soon (can't speak for extended periods of time, I fear), so let's see if I can provide a bit of info here.
First, just to clarify, I don't have that much involvement in the MPI side of things - and much less since that original series of talks I gave with Jeff given my retirement. So I cannot address MPI changes in OMPI v5 and above.
However, I can say a few things about the runtime infrastructure in OMPI v5 and beyond. Probably the biggest change you will note is that we dropped all support for PMI-1 and PMI-2. OMPI is now solely committed to PMIx and makes extensive use of PMIx advanced features. Unfortunately, 3rd-party environments have not provided the backend support for those PMIx features, so some (many?) of the new OMPI features are only supported when the job is started by mpirun
.
Bottom line: if it is just a vanilla MPI code, then using things like srun
to start the job is fine. If you want MPI-4 or beyond features, you may likely need to use mpirun
.
We also ditched the old ORTE runtime embedded in OMPI and instead make use of the PMIx Reference RunTime Environment (PRRTE) to provide the mpirun
support. I originally developed PRRTE to provide an early platform for people wanting to explore PMIx, and it has seen widespread use - particularly as a "shim" layer to non-PMIx based environments such as you find on Cray. Get an allocation, launch the PRRTE "DVM" (distributed virtual machine), and then execute applications underneath it. PRRTE fully supports dynamic operations (unlike the host environment itself), so this provided people with a means of utilizing comm-spawn and other features.
Integrating PRRTE into OMPI provided a bit of a challenge as we had to create a mechanism by which OMPI could customize the PRRTE command line - e.g., to comply with MPI Standard specifications. However, we managed to do so. Some of the OMPI v4 cmd line syntax has changed, so you may need to make some adjustments. OMPI chose to silently translate quite a few of the changes, but not everything may "just work" - you may need to update.
OMPI v5+ continues its tradition of providing an embedded version of both PMIx and PRRTE. While these will work, they tend to lag behind when it comes to bug fixes - just the usual problem of trying to coordinate the release cycles across multiple independent projects so we aren't constantly dropping new versions. So we strongly recommend that you install those two packages separately so you can update them to capture bug fixes.
One note in that regard: PMIx and PRRTE are strongly coupled. In fact, PRRTE makes use of a significant amount of PMIx-internal code. This helps reduce code duplication and thereby makes PRRTE a little easier to maintain. However, it does mean that the two should always be updated together - i.e., if you update one to capture bug fixes, then you really need to update the other one as well. Keeping the two in-sync is important!
You do have the option of not building PRRTE if you just want to direct-launch (i.e., launch with something other than mpirun
- such as srun
) MPI apps. You cannot skip PMIx, of course - but you can build just the MPI layer with PMIx, HWLOC, and libevent if you don't want/need mpirun
.
There is one point of "pain" with regards to operating under Slurm. The Slurm folks decided they wanted to regain full control over the srun
cmd line, but were unable to do so because MPI launchers use srun
to start their daemons. Long story short, they used a mechanism to inject a cmd line option into our launcher - and that turned out to have some undesirable side-effects. So if you are using Slurm 23.11 or newer, you may run into some unexpected behavior.
We have resolved that in PRRTE, but only starting with the upcoming PRRTE v3.0.7 release. So if you are operating under Slurm, I would strongly advise advancing PRRTE to at least that level (and remember to update PMIx to its latest release at the same time). I'm not sure what version of OMPI release will be at that level - suspect it will be available in OMPI v5.0.6, but you should check, or just build/use an external version of PRRTE to be safe.
Hope that all helps. Feel free to raise questions and I'll do my best to answer them.
Thanks for writing all of this up! The main reason I insisted on this getting recorded somewhere is that at this point in time, I haven't gotten around to really understanding any of this architecture, but perhaps I can later. :)
All the best,
@boegel was there ever a recorded talk by Ralph Castain about PMIx as mentioned at https://youtu.be/U_xC6L1f3B8?list=PLhnGtSmEGEQhK-qmBEM-tg4AiwBOcm5k1&t=298 ? (" EasyBuild Tech Talk I - The ABCs of Open MPI, part 3 (by Jeff Squyres & Ralph Castain) ")