Open bradcray opened 3 months ago
Yes! This definitely resonates with me. I love the idea of introducing a CHPL_INTERCONNECT variable. As Brad says above, this is easy to "know" from a user perspective, because it is easy to look up this info about your HPC system. Plus, if this info can then be used to infer CHPL_COMM* variables that would be great too. To be honest, when I first started, I didn't realize "ibv" stood for Infiniband, and if there was an easier starting point, that would be great!
The proposal can make building oversubscribed Chapel easy as well, but I can't tell how exactly. In the proposed world, what's the way to build Chapel with oversubscription? CHPL_INTERCONNECT=none && CHPL_COMM=gasnet
? A new value for CHPL_INTERCONNECT
? A completely new variable?
@e-kayrakli: Hmm, good question. My first reaction was to require someone wanting oversubscription to use the lower-level variables thinking they'd somehow be "more expert" so should deserve the extra work, but thinking about it more, I think that wanting an oversubscribed Chapel for development purposes is pretty common, suggesting it should be similarly friendly. My thought would be to make it a new value like virtual
or local
which would result in defaults like CHPL_COMM=gasnet
and CHPL_COMM_SUBSTRATE=udp
or smp
. I feel least excited about making it a new variable—it feels similar to CHPL_GPU=cpu
to me where we also used a special value rather than a new variable.
but thinking about it more, I think that wanting an oversubscribed Chapel for development purposes is pretty common, suggesting it should be similarly friendly.
This strongly resonates with me. I don't view this mode to be a power-user mode. It may be so currently, but this proposal could be an excuse to improve the story there.
I like the way that this idea would allow us to hide implementation details (it's using gasnet or ofi). This would also address a point of user feedback where a user requested the abilitiy to simulate multiple locales on a single system without being aware that gasnet exists at all.
I've taken the liberty of adding "user issue" here due to both Michael's connection to the previous issue and Bonnie's response.
Today, we support a
CHPL_TARGET_PLATFORM
variable that sometimes tells us a lot about the target platform if it's something specific like an HPE Cray EX or Cray XC system, but sometimes tells us little if it's a Linux cluster. In the latter case, the user has to setCHPL_COMM_*
variables to specify how Chapel should map itself to the interconnect, using values likegasnet
orofi
. In this issue, I'm wondering whether we should introduce aCHPL_INTERCONNECT
orCHPL_NETWORK
variable that would support values likenone
,slingshot
,infiniband
,ethernet
,efa
,unset
, etc. as a higher-level way to say something about the target system that's higher-level and likely more known/knowable to a user than the details of how our communication is implemented. From there, we could then (typically) infer reasonable values for the lower-levelCHPL_COMM*
related variables (while still permitting a user to set them explicitly, if desired).For example, I might imagine that setting
CHPL_TARGET_PLATFORM=hpe-apollo
would causeCHPL_INTERCONNECT
to be inferred to beinfiniband
which would then causeCHPL_COMM
to be inferred togasnet
andCHPL_COMM_SUBSTRATE
to be inferred to beibv
(and so on). Yet on a Linux cluster that doesn't have a more specific platform identifier thanlinux64
, a user could setCHPL_INTERCONNECT=infiniband
and get the same lower-level settings. Or on an Apollo system, the user could override the default and setCHPL_COMM=ofi
if they wanted to try the ofi-based implementation.To me, this seems like it would prevent most users from ever having to set
CHPL_COMM
or its related variables, which feels like a win since that's more about how we implement things than about things a typical user would know, or should need to know.