StanfordLegion / gasnet

11 stars 13 forks source link

Build now takes LEGION_GASNET_SYSTEM argument to choose the system to build for #19

Closed elliottslaughter closed 1 year ago

elliottslaughter commented 1 year ago

This allows you to run e.g.:

make LEGION_GASNET_CONDUIT=ofi LEGION_GASNET_SYSTEM=slingshot11

Instead of:

make CONDUIT=ofi-slingshot11

(The CONDUIT spelling is still supported for backwards compatibility.)

The intention is that LEGION_GASNET_SYSTEM could be a specific system (e.g., Frontier) or a class of systems (e.g., slingshot11 for all systems that use the Slingshot 11 network).

This is intended to help us address https://github.com/StanfordLegion/legion/issues/1468.

streichler commented 1 year ago

It matches what I was thinking of, but now that I see it, I wonder if SYSTEM is too generic a name? make will initialize variables from the environment, so if anybody has SYSTEM=... in their environment, they won't get what they want here.

elliottslaughter commented 1 year ago

I'm happy to bikeshed as desired. Some possible options:

streichler commented 1 year ago

how about GASNET_CONDUIT and GASNET_SYSTEM, with CONDUIT being a legacy alias for GASNET_CONDUIT?

elliottslaughter commented 1 year ago

@streichler Ok, how does it look now?

bonachea commented 1 year ago

Re: naming choice of $GASNET_SYSTEM:

I don't love the subtle implication that this setting is somehow an entity defined by GASNet (unlike $GASNET_CONDUIT, which DOES correspond to a GASNet concept).

The values of $GASNET_SYSTEM represent canned configurations that are entirely a fabrication of this StanfordLegion/gasnet repository and don't exist in implementation or documentation anywhere else. I understand this name is motivated by LEGION's use of GASNet, but that subtle distinction is likely to be lost on users. As such I'd prefer a name that didn't include GASNET_ to reflect that. From @elliottslaughter 's suggestions, any of SUBCONFIG, NETWORK_SYSTEM or CONDUIT_SYSTEM all seem like better choices.

Just my 2c...

bonachea commented 1 year ago

I think it's also worth mentioning here that the actual distinction being made here is GASNet's configure --with-ofi-provider setting, which (when omitted) DOES include automatic defaulting logic that works reasonably well on many systems (configure will use fi_info to query for available providers). So it might be worth providing a config flavor that omits that argument and uses auto-detection, thus only requiring this override for rare systems where the detection doesn't work (e.g. where the build node lacks the high-speed network hardware).

FWIW we also provide a configure --with-ofi-provider=generic setting that builds the conduit in a portable mode for ANY supported provider, at some additional overhead cost in adaptation to the provider at runtime (instead of statically for a specific provider), So that's also available as an "always works" option, although sub-optimal for any particular system.

CC: @PHHargrove

elliottslaughter commented 1 year ago

Thanks, @bonachea. About the auto-configure option: are there any plausible scenarios where configuration might succeed but then fail to find the correct OFI provider? I.e., can we rely on this logic to either: (a) do the "right" thing (for some definition of "right"), or (b) fail outright? Or are there any scenarios where there is a genuine question of what the "right" answer is (e.g., networks on which there are legitimately two different providers that users might want to use, depending on the circumstances)?

bonachea commented 1 year ago

are there any plausible scenarios where configuration might succeed but then fail to find the correct OFI provider?

Yes. Some examples include:

  1. An OmniPath cluster where the frontend login node used for building software lacks an OPA hardware adapter, so fi_info at configure time only reports generic TCP-based providers instead of the high-performance PSM2 provider that should be used on the compute nodes.
  2. A few months ago Perlmutter had a mix of SlingShot-10 and 11 nodes, so both looked "available" at configure time and the right choice depended on the system partition you planned to request in the compute job.

I.e., can we rely on this logic to either: (a) do the "right" thing (for some definition of "right"), or (b) fail outright? Or are there any scenarios where there is a genuine question of what the "right" answer is (e.g., networks on which there are legitimately two different providers that users might want to use, depending on the circumstances)?

As I mentioned, there's a "generic" provider setting that does all the provider adaptation at runtime based on the "best" provider it finds at job startup (where "best" is currently defined by this ordered list: "cxi psm2 gni verbs;ofi_rxm efa sockets udp;ofi_rxd tcp;ofi_rxm"). This adds some runtime overhead in the steady state, but is the "safest" option in cases where we really cannot or prefer not to make a decision at configure time.

Assuming you are NOT using that --with-ofi-provider=generic provider, the remaining options:

  1. auto-detection (no --with-ofi-provider option), which queries fi_info and applies the priority list above to statically compile for the best provider it finds in the configure environment.
  2. --with-ofi-provider=X which demands particular provider X at configure time.

Both select a provider at configure time and apply static optimizations to the conduit code based on that choice. There are at least two ways this might get the "wrong answer" in corner cases:

  1. If a provider was specified at configure time which is not actually available at runtime, the job will die at startup with an explanatory message.
  2. If a sub-optimal provider was specified at configure time (e.g. sockets provider for the OPA example system described above), and that slower provider IS available at runtime, the job should still run, but at least in some cases will print warnings about use of a sub-optimal provider.

Even outside of ofi-conduit, our startup checks in NDEBUG mode include looking for particular hardware (e.g. Mellanox InfiniBand HCAs) and warning if the conduit/provider choice looks sub-optimal. However these checks are not fool-proof, and they might be ignored or silenced by the end-user.

elliottslaughter commented 1 year ago

Thanks. My gut feeling based on my current understanding of the tradeoffs is to continue down the current path and just fine-tune the variable names to accurately reflect what we're trying to accomplish.

streichler commented 1 year ago

This approach allows for GASNET_SYSTEM (discussion on naming to follow below) to be blank, which would mean that a system-agnostic config/config.ofi.release file would probably need that auto-detection goodness to work at all?

For the naming, I understand the concern, but the variables are specific to the GASNet build we're doing, so having GASNET in the name seems reasonable. If we wanted to be explicit that it was relevant (only) to Legion's build of GASNet, then that would suggest LEGION_GASNET_CONDUIT and LEGION_GASNET_SYSTEM. The other way out of this is to tell make to ignore SYSTEM if it came from the environment (i.e. it'd have to be on the command line to have an effect), but that'd probably surprise people too.

bonachea commented 1 year ago

This approach allows for GASNET_SYSTEM to be blank, which would mean that a system-agnostic config/config.ofi.release file would probably need that auto-detection goodness to work at all?

As currently written I don't think the code literally allows for GASNET_CONDUIT=ofi GASNET_SYSTEM="" input.

However I agree with Sean's suggestion that offering more generic options could be valuable. Generalizing, you could even provide both config/config.ofi-auto.release and config/config.ofi-generic.release, where GASNET_SYSTEM=auto omits --with-ofi-provider to activate configure-time "auto-detection goodness" and GASNET_SYSTEM=generic passes --with-ofi-provider=generic to activate the fully general (but most expensive) runtime adaptation.

PHHargrove commented 1 year ago

To add my $0.02 USD regarding naming:

  1. If, as I gather from @elliottslaughter's initial comment, the name fragment SYSTEM is meant to allow the user to convey both the ideas "use this network" or "use settings appropriate for Frontier", then I offer PLATFORM and TARGET as possible synonyms.

  2. Please don't use GASNET_ as the prefix. Doing so risks conflict with things in GASNet itself, especially since GNU Make will export all make variable settings to the environment by default. For instance, GASNET_PLATFORM is a shell variable used in our configure.

elliottslaughter commented 1 year ago

I have updated the PR to use the names LEGION_GASNET_CONDUIT and LEGION_GASNET_SYSTEM. The old spelling CONDUIT is still supported for backwards compatibility with existing users.

Let me know if you have any further concerns.