chapel-lang / chapel

a Productive Parallel Programming Language
https://chapel-lang.org
Other
1.79k stars 420 forks source link

Should CHPL_REGEXP be called CHPL_RE2? #17306

Closed bradcray closed 3 years ago

bradcray commented 3 years ago

This is somewhat related to issue #17220 and also to #17222: Assuming that our Regex module continues to be a standard (rather than package) module and based on RE2, should we should consider changing the CHPL_REGEXP environment variable to CHPL_RE2 and the values to be bundled vs. none, to be more similar to other third-party packages we rely upon? The current CHPL_REGEXP variable with values re2 and none suggests to me that one could plug in other packages rather than re2, yet I don't know that any of us expect that to be supported or supportable (?).

Options discussed below: :heart: CHPL_RE2 = bundled | none πŸŽ‰ CHPL_REGEX = re2 | none πŸš€ CHPL_REGEX = bundled | none :+1: CHPL_REGEX = re2 | none && CHPL_RE2 = bundled | none

Rationale for the options:

:+1: β€”Β Brad argues that this is the most flexible approach in that:

bradcray commented 3 years ago

I think @vasslitvinov asked a question which either proposed this idea or suggested it to me (not sure which). Tagging @leekillough on this since he's been running with these topics and had already started looking into making CHPL_REGEXP into CHPL_REGEX.

lydia-duncan commented 3 years ago

I'd prefer having it be CHPL_REGEX rather than CHPL_RE2, and I think doing so makes it more obvious to the user that they need it when using the module. If we're going to rename the environment variable, I'd rather it be kept in line with the module name unless we plan on having the Regex module call out to different implementations under the covers (and in that case I'd prefer it to affect the value of the environment variable anyways).

bradcray commented 3 years ago

For the sake of argument:

Variables that take multiple settings, some of which correspond to third-party packages:

Variables that take bundled vs. none settings, some of which correspond to third-party packages:

To me, this choice seems more like the second category than the first.

Note that most users never need to set CHPL_RE* since it defaults to using the bundled version as long as it built properly.

lydia-duncan commented 3 years ago

I would argue that the difference is that regular expressions are a more commonly known language feature than any of the other options in your second section

bradcray commented 3 years ago

I would argue that the difference is that regular expressions are a more commonly known language feature than any of the other options in your second section

What does that matter, though? (given that most users will never have a need to set the variable).

vasslitvinov commented 3 years ago

To me:

bradcray commented 3 years ago

@vasslitvinov: I agree with you (but am not sure whether you're agreeing with me or not). If we expected to support (or be able to support) multiple Regex implementations for the standard Regex feature/module, then I agree it should be CHPL_REGEX. But I believe our implementation is fairly hard-coded to RE2 and that the nature of regex libraries each having their own syntax would make it hard to have multiple options available as we do for CHPL_COMM, CHPL_TASKS, CHPL_MEMβ€”at least without changes to user code (which we don't require for those other options). This is what makes me think this choice is more like GMP, LLVM, etc.

vasslitvinov commented 3 years ago

If we indeed tie the Regex module to RE2, then I can see that CHPL_RE2 makes more sense. CHPL_REGEX would be a binary flag (none or re2) vs. CHPL_RE2 could have more possible values.

lydia-duncan commented 3 years ago

This is what makes me think this choice is more like GMP, LLVM, etc.

Of these, only GMP is also a module. And GMP matches its module name. We seem to think that Regex is a better name for the module than RE2 despite the fact that it is fairly tied to the RE2 implementation (which I agree with). My position is based on the argument that Regex is a better name for the module and the argument that consistency is beneficial. It seems we disagree on what it should be consistent with, not whether consistency is important.

I think arguing that "the user won't have to set it very often" undercuts the value of having it be consistent with other environment variables - if setting the environment variable is unimportant to the user, then why should we prioritize consistency with other environment variables in similar situations over consistency with what the user will interact with more? In the situations where the user does want to set the value explicitly, it will be because they found they couldn't use the module they care about, so making it easier to go from "I can't use the Regex module" to "oh, my environment variable is set incorrectly" should be the priority, in my opinion.

Are there scenarios that I am not thinking of where having the value not be bundled matters? If so, maybe CHPL_REGEX_RE2 is a better name for the environment variable - this would handle my concerns about the user not recognizing the connection to the module name while still being sufficiently tied to RE2 itself that bundled is appropriate.

bradcray commented 3 years ago

It seems we disagree on what it should be consistent with, not whether consistency is important.

I agree with that.

Of these, only GMP is also a module. And GMP matches its module name.

That doesn't carry a lot of weight with me, in part because GMP isn't a module that we'd encourage most users to use in practice. At this point it's more like the underlying (non-Chapeltastic) implementation strategy for the "BigInt" module, in much the same way that RE2 is the underlying implementation of Regex. The difference is that we chose to expose the low-level GMP interface to the user in the "BigInt" case, but not to expose the RE2 interface to the user in the Regex case. I'd say that the environment variable name describes the third-party library name, which happens to match the user-facing module name rather than the other way around.

if setting the environment variable is unimportant to the user, then why should we prioritize consistency with other environment variables in similar situations over consistency with what the user will interact with more?

Because it offers consistency for developers who seem to me to likely be the main people who have to interact with this variable (through scripting, documentation, their testing environments).

Are there scenarios that I am not thinking of where having the value not be bundled matters?

I think it's plausible that we could support a system installation of re2 (potentially with restricted functionality since it wouldn't have Michael's patches unless we successfully get those merged upstream, which would be attractive for other reasons). But it's not something we're likely to prioritize anytime soon. My dissatisfaction with the current situation relates to the fact that there aren't siblings of re2 that I can imagine specifying as alternatives for CHPL_REGEX, so it's essentially a boolean variable whose values seem misleading and inconsistent with other environment variables. If we did have multiple implementations, I'd expect using CHPL_REGEX to switch between these, and then variables like CHPL_RE2, CHPL_PCRE to say where the implementations should come from, similar to how CHPL_MEM can be set to jemalloc and then CHPL_JEMALLOC can be set to none vs. system vs. bundled.

Now, we could imagine supporting both CHPL_REGEX = none | re2 and CHPL_RE2 = none | bundled as a compromise solution. But to me, this adds complexity without value since it uses two variables to express what is currently a binary decision, one of which isn't likely to support additional values over time.

To further illustrate your position, would you outline a scenario that would motivate a user to set the environment variable, explain how they would go about learning that they should set it, and explain why having it match the module name would simplify that experience for them?

lydia-duncan commented 3 years ago

To further illustrate your position, would you outline a scenario that would motivate a user to set the environment variable, explain how they would go about learning that they should set it, and explain why having it match the module name would simplify that experience for them?

A user starts with the quickstart settings (which explicitly sets CHPL_REGEXP=none today) as a way of getting up and running with Chapel. They're experimenting, they're playing around - they haven't gotten to the phase where they feel like turning everything on yet and using the normal setchplenv script. Maybe they're looking at libraries to see what they can do, maybe they just have a toy regular expression program that they like to play with. They try to use the Regex library and get the error message about having the environment variable set to none. In the case where the environment variable matches the module name they think "of course" and just set the environment variable. In the case where the environment variable is CHPL_RE2, now they're thinking about underlying implementation details, are maybe confused about why this setting matters, but probably still set it. If they'd found printchplenv output, maybe they would have thought to explicitly set CHPL_REGEX before experimenting with the library and never encountered the error in the first place (whereas they are unlikely to think of that for CHPL_RE2).

At the end of the day, it's not a major problem and not super important to me, I'm mostly just offering my two cents because I had an opinion and you asked for opinions. At any rate, thanks for caring about where the opinion came from, even if at the end of the day we don't end up following it.

leekillough commented 3 years ago

I prefer CHPL_REGEX instead of CHPL_RE2.

But whatever we change CHPL_REGEXP to, it looks like there will be a lot of difficulty changing it if we want to preserve backward-compatibility, which is why CHPL_REGEXP was not changed in #17276.

bradcray commented 3 years ago

In the case where the environment variable is CHPL_RE2

So to make sure I'm not missing anything: it's not as though they'll have any troubles setting the variable or determining that they need to set it, it's just that the name doesn't match the module which might cause them to puzzle over it.

now they're thinking about underlying implementation details

I'm not sure I completely buy that the variable name is going to be the thing that causes them to think about this. The first sentence of the Regex[p] module's documentation states the strong tie to RE2, and points to the RE2 documentation as the official reference for the syntax: https://chapel-lang.org/docs/master/modules/standard/Regexp.html

it looks like there will be a lot of difficulty changing it

I agree that renaming an environment variable isn't the most fun change to make, which is why I wanted to discuss what we should be changing it to before too much effort was exerted there (to avoid having to change it twice). But when you say "difficulty", is it just a lot of detail to slog through, or are there deep challenges? I think @daviditen had some recent experience with changing and deprecating environment variables with the bundled effort (if not others?), so may have some best practices to share.

leekillough commented 3 years ago

I think CHPL_REGEX sounds better, even if we change its allowed values to be either none or bundled, if we want to de-emphasize the RE2 implementation, and allow the bundled option to change to mean something else later.

CHPL_REGEX=bundled means "I want the bundled Regex implementation, whatever that's based on."

CHPL_RE2 emphasizes that it's based on RE2, and we have the choice of either the bundled RE2 or none.

The name CHPL_RE2 strays away from the Regex module name, and it might not be clear to the user what CHPL_RE2 means, since it's just choosing an under-the-covers Regex implementation.

dlongnecke-cray commented 3 years ago

I voted for CHPL_RE2 = bundled | none. That seems consistent with what we do for other third-party packages (GMP, HWLOC, LLVM all come to mind) where each is a specific software entity and not a general feature.

We could imagine a world where setting CHPL_RE2=none causes us to default to an implementation of regex written entirely in Chapel code. Viewed from that lens, that we depend on re2 to use Regex can be seen as a limitation of the current implementation.


Another option that makes some sense to me is to mimic the structure of CHPL_MEM and say something like CHPL_REGEX = re2 | chapel. Where we can use re2 as an accelerator like we do with jemalloc, or we can fall back on the Chapel based implementation (where we'd just emit an error saying it's not currently supported, and to use re2).

This only makes sense if it's easy to swap out our regex engine (I have no clue how difficult this is, it's probably not as easy as swapping our memory allocator). Our talks earlier made it sound like it was not trivial.

mppf commented 3 years ago

I agree with @leekillough's comment above. In particular I like CHPL_REGEX=bundled because while we use RE2, it's not possible for somebody to provide their own RE2 (since we have custom patches to it). E.g. CHPL_RE2=system does not make sense. Additionally I definitely do not think of RE2 as fundamental to regular expressions in Chapel. It is just how we implemented it. So I would prefer to hide that detail, if we can.

If in the future, we change to some other regex implementation (let's say it's called bogoregex for the discussion), then I would expect it is used from the Regex module instead of RE2. I do not think we would try to support both at once (the way we do with CHPL_COMM say). I would further expect that we bundle it, so that CHPL_REGEX=bundled continues to work. At that time, if it's possible to use a system-wide install of bogoregex, I would advocate for making the adjustments to allow people to write CHPL_REGEX=bogoregex and CHPL_BOGOREGEX=system.

However this is not something I care deeply about.

The argument that carries the most weight (for me) in favor of CHPL_RE2= is that maybe it makes the question of "what am I going to build when I type make" more obviously answerable. However I'm not so sure that is a goal of printchplenv.

bradcray commented 3 years ago

Additionally I definitely do not think of RE2 as fundamental to regular expressions in Chapel. It is just how we implemented it. So I would prefer to hide that detail, if we can.

To me, this seems contrary to the notion of stabilizing our standard libraries. If we use something other than re2 to implement them, the regular expression syntax is likely to change, isn't it? Unless you're suggesting that in the future we have a different implementation than re2 that happens to implement the identical regex syntax as re2. (Is that likely?)

Imagining that it is for a moment, that actually suggests going the other way to me. That is CHPL_REGEX=re2 if we want to support regex with re2 vs. CHPL_REGEX=my_package_that_implements_re2 when we have an alternative? Where for someone selecting the re2 option, we'd then use CHPL_RE2 to indicate where it should come from (e.g., bundled). To me, that's the advantage of the :+1: approach, where I view the :heart: approach as a simpler stepping stone for the time being given that there's no real need for two environment variables currently.

it's not possible for somebody to provide their own RE2 (since we have custom patches to it).

Not currently, but it's conceivable that in the future (when we've taken over the world), we'd get the patches incorporated and be able to use a system version (I definitely wouldn't want to preclude our ability to do so since it'd be attractive for other reasonsβ€”like simplifying the process of upgrading re2 versions). Or that we'd stop relying on the patches for other reasons (stop needing the feature; re2 provides equivalent features that we can use).

mppf commented 3 years ago

If we use something other than re2 to implement them, the regular expression syntax is likely to change, isn't it?

Not necessarily. We can convert regular expression strings to handle minor differences. As I understand it, RE2 is relatively restricted in terms of what it supports, so I don't think we'll run into features we cannot express in another library's syntax. I would expect the minor differences to mostly come up with things like \s and the names of character classes.

But, I am also uncertain to what degree we can stabilize the Regex module without at least answering the philosophical question of if we are defining our own regexp syntax or if we are providing an easy-to-use interface for an existing library. I don't recall if we have answered that already. But anyway all of this is arguably off-topic here and belongs in #17220.

bradcray commented 3 years ago

We can convert regular expression strings to handle minor differences. As I understand it, RE2 is relatively restricted in terms of what it supports, so I don't think we'll run into features we cannot express in another library's syntax.

That argument suggests the :+1: approach to me then, rather than the :rocket: approach: A CHPL_REGEX variable that says what underlying technology is used to implement the Regex module (e.g., re2, pcre, etc.) and then, in the event that re2 is selected, a CHPL_RE2 variable saying where to find the implementation, as we do with other high-level technologies that have multiple implementations (CHPL_MEM=jemalloc, CHPL_JEMALLOC=bundled). I guess your counterargument is:

I do not think we would try to support both at once (the way we do with CHPL_COMM say).

But I don't see any inherent reason that this would be the case...

leekillough commented 3 years ago

I definitely prefer :+1: to :heart:, because :heart: limits us to RE2 or nothing.

In decreasing order of preference:

:rocket: :tada: :+1: :heart:

bradcray commented 3 years ago

I definitely prefer πŸ‘ to ❀️, because ❀️ limits us to RE2 or nothing.

I don't think that's right, because we could do :heart: now and add the CHPL_REGEX variable later if/when we had a second package we wanted to support (which would essentially take us to :+1:). This is the reason I prefer :heart: : it's simpler than :+1: (one variable rather than two), seems most similar to what we do in other cases, and seems most future-proof since it permits us to switch to :+1: later by adding something without changing anything.

leekillough commented 3 years ago

But locking ourselves to RE2 seems to be a step backward rather than forward, even if we can do :+1: later.

bradcray commented 3 years ago

We're not locked to RE2 by taking the :heart: approach: If we later choose to also (or instead) accept CHPL_REGEX=pcre (say), the CHPL_RE2 variable would simply be ignored in that case (much as CHPL_JEMALLOC is if CHPL_MEM isn't set to jemalloc). And presumably at that time, a new CHPL_PCRE variable would be checked to see which version of PCRE to use.

leekillough commented 3 years ago

❀️ CHPL_RE2 has RE2 in the name, and hence locks us to RE2.

bradcray commented 3 years ago

It would lock our "which version of RE2 should be used by this build of Chapel?" variable (CHPL_RE2) to RE2, but it wouldn't lock our "What underlying regex technology should be used to implement the Regex module?" variable (CHPL_REGEX) to RE2. This is analogous to how CHPL_MEM says how memory should be implemented, and if jemalloc is chosen as the option, CHPL_JEMALLOC says where to find it (without locking the choice of memory implementation to jemalloc).

gbtitus commented 3 years ago

I believe that Chapel will eventually want to standardize a regular expression syntax, which won't be an exact match to that defined by RE2. I believe that means πŸ‘ is what we need for stabilization. I'm not too fond of any of the others, because they require an additional future change after users do the current CHPL_REGEXP --> CHPL_REGEX change. If we just say that when CHPL_REGEX=re2, having CHPL_RE2 unset implies "bundled", are we basically at πŸ‘ ?

bradcray commented 3 years ago

I believe that Chapel will eventually want to standardize a regular expression syntax, which won't be an exact match to that defined by RE2. I believe that means πŸ‘ is what we need for stabilization.

Hi Greg β€”Β I used to think this too until Michael pointed out that our RegEx module could translate between the user's regular expression strings and the ones used by an underlying implementation in cases where it doesn't match, would allow us to target other implementations.

gbtitus commented 3 years ago

... our RegEx module could translate between the user's regular expression strings and the ones used by an underlying implementation ...

Ah, good point. That makes me wonder whether we even need CHPL_REGEX then. Perhaps CHPL_RE2=bundled|none (maybe adding system someday) is enough. We would only need CHPL_REGEX=re2|somethingElse if we had more than one underlying implementation, in which case the translation Michael and you described would need to know what that underlying implementation was. And the only reason I can think of that we'd want multiple implementations was if the performance profiles of the implementations weren't the same, i.e. if neither outperformed the other overall but rather each had things it was better for. I definitely wouldn't expect a functionality difference.

bradcray commented 3 years ago

That makes me wonder whether we even need CHPL_REGEX then. Perhaps CHPL_RE2=bundled|none (maybe adding system someday) is enough.

That's why I voted for :heart: as my first choice.

bradcray commented 3 years ago

In the post-meeting discussion today, the argument was brought up about approaches that embed "re2" into the variable name or its setting causing problems for users down the line if/when it changes away from re2, but I don't really buy this argument because my guess is that users don't set this variable 99% of the time (because it defaults to an appropriate setting when unset).

dlongnecke-cray commented 3 years ago

Part of what keeps me from voting for (πŸ‘ CHPL_REGEX = re2 | none && CHPL_RE2 = bundled | none) is that I really don't like the idea of us having a variable that when set to none basically implies that we don't support regex anymore.

It's true that this is the case today, but I don't think we should advertise that because it's something that we recognize as a shortcoming and intend to fix in the future (by adopting our own regex language, by perhaps implementing a regex engine in Chapel, etc).

Basically, I don't really think we should consider REGEX as something that the user can just "turn off".

I'd be willing to vote for πŸ‘ if "none" was changed to something like "native" or "chapel".

gbtitus commented 3 years ago

I think I'm back to πŸ‘ , because:

bradcray commented 3 years ago

I don't think we should advertise that because it's something that we recognize as a shortcoming and intend to fix in the future

I don't view it that way, personally. Today, it's there primarily so that (a) if you don't want to spend your resources building re2, you can opt out of it (e.g., quickstart mode does this) and (b) if it causes any portability problems, you're not stuck with a broken version of Chapel by not being able to opt out of it. Those both seem like reasonable reasons to maintain a way of opting out of it to me.

e-kayrakli commented 3 years ago

I voted :tada: (CHPL_REGEX = re2 | none)

dlongnecke-cray commented 3 years ago

(a) if you don't want to spend your resources building re2, you can opt out of it (e.g., quickstart mode does this) and (b) if it causes any portability problems, you're not stuck with a broken version of Chapel by not being able to opt out of it

I agree that you shouldn't have to build re2 if you don't plan on using regex, but I'd be much happier if the other option was something like chapel, and then if you tried to use Regex; it would give you the error about "Chapel-based regex is not supported, please use CHPL_REGEX=re2 instead".

bradcray commented 3 years ago

but I'd be much happier if the other option was something like chapel, and then if you tried to use Regex; it would give you the error about "Chapel-based regex is not supported, please use CHPL_REGEX=re2 instead".

I'm not aware of any plan (near-term or long-term) to build a native Chapel regex package, so to me, it seems odd to make the "opt-out of regex" option require opting into something that hasn't been discussed and seems unlikely to be built soon. We don't opt out of multi-locale runs by setting CHPL_COMM=chapel, so why would we opt out of regex support differently? Even if a native-Chapel Regex library were introduced at some point in the future, we could still add a CHPL_REGEX=chapel option at that point to opt into it without breaking anything.

I don't think we should advertise that because it's something that we recognize as a shortcoming

I don't view it as a shortcoming to be able to say "I don't want/need this feature, so please don't spend my time/space on it."

dlongnecke-cray commented 3 years ago

I'm not aware of any plan (near-term or long-term) to build a native Chapel regex package, so to me, it seems odd to make the "opt-out of regex" option require opting into something that hasn't been discussed and seems unlikely to be built soon.

I guess I thought differently, given that some people suggested interest in adopting a Chapel regex syntax. If we had a Chapel regex syntax, it doesn't seem so farfetched to say that the Chapel regex engine just isn't implemented yet.

We don't opt out of multi-locale runs by setting CHPL_COMM=chapel, so why would we opt out of regex support differently?

I guess CHPL_COMM=none makes more sense to me because the language doesn't suddenly break if you decide to use an on statement. (There might be other cases where the compiler just breaks if you try to do something distributed, feel free to correct me!)

I don't view it as a shortcoming to be able to say "I don't want/need this feature, so please don't spend my time/space on it."

I'd like to view CHPL_REGEX as "select your regex backend", not as "turn regex on or off". In that light, selecting chapel or native makes sense even if it isn't implemented right at this moment...since it's native Chapel, you aren't linking in an extra library that you won't use...

I can't seem to resist reading CHPL_REGEX=none as "turn regex off". I suppose you could interpret it as "don't use a Regex backend" (which could imply using a Chapel-based implementation at some point), but my brain just doesn't seem to want to do that.

bradcray commented 3 years ago

I can't seem to resist reading CHPL_REGEX=none as "turn regex off". I suppose you could interpret it as "don't use a Regex backend" (which could imply using a Chapel-based implementation at some point), but my brain just doesn't seem to want to do that.

I also read it that way, and agree that we should not redefine none to mean "use a Chapel-native version" in the future, but should use chapel or native for that. Even using your "select regex backend" interpretation (where I might say "select the technology that implements Regex" instead), none seems accurate: "I don't want a regex backend, but thanks for offering."

CHPL_COMM=none makes more sense to me because the language doesn't suddenly break if you decide to use an on statement.

Regex isn't part of the language, though, it's a standard library that is optional to use. Getting a compiler error saying "You've asked to use Regex but don't have a regex back-end available" doesn't seem that much worse than getting an execution time error saying "You said to run on 2 locales, but didn't enable communication, so can't do that." (in fact, it seems better since it's compile-time rather than execution-time).

Consider some other key cases of using none in our CHPL_ settings:

Now, granted, most of these are the lower-level packages rather than the higher-level feature, so if you don't have problems with them, that might suggest you might be open to a scheme in which:

Which is effectively the :heart: simplification of the :+1: proposal.

dlongnecke-cray commented 3 years ago

Which is effectively the ❀️ simplification of the πŸ‘ proposal.

Yeah, I voted for CHPL_RE2=bundled|none (the ❀️ proposal). I like it because it doesn't talk about REGEX at all, and in my mind sidesteps this issue of CHPL_REGEX=none.

Part of my reason for this exchange was to see if I could convince myself to vote for πŸ‘ in the interest of moving things forward, but it seems like changing my vote leaves us no better off than before...

gbtitus commented 3 years ago
  • CHPL_JEMALLOC = none: "Don't rely on jemalloc being available because it's not."

@bradcray Something of a nit, but I think your examples are good ones for the settings such as CHPL_GMP that specify whether or not an optional third-party package is included (which was your point), but not for the ones such as CHPL_JEMALLOC that specify where to find a package that definitely must be present. CHPL_JEMALLOC is totally ignored if CHPL_MEM!=jemalloc, but if it's set to "none" with CHPL_MEM==jemalloc you'll get an error when you try to build the runtime. CHPL_LIBFABRIC is similar. "none" isn't a meaningful setting for these.

I think part of what we're struggling here is that the REGEXP/REGEX/RE2 setting is currently of the CHPL_GMP sort, and we're wondering whether to turn it into the CHPL_JEMALLOC sort. Either way we're answering two questions:

  1. Is there regular expression support?
  2. If so, where does it come from?

Much of the tension between πŸ‘ and ❀️ seems to be about whether we want to be able to answer these two separately, or if it's sufficient to only be able to answer them simultaneously.

bradcray commented 3 years ago

"none" isn't a meaningful setting for these.

That's a reasonable point. But...

of the CHPL_GMP sort, and we're wondering whether to turn it into the CHPL_JEMALLOC sort

I don't see CHPL_GMP and CHPL_JEMALLOC as being particularly different from one another (nor from CHPL_COMM, for that matter). Here's what I think the unifying principle is for our current settings, where cases that don't exist (but could if/when we wanted or needed to add them without breaking anything) are in square brackets:

concept high-level control low-level control(s)
communication CHPL_COMM CHPL_LIBFABRIC
[CHPL_GASNET]
tasking CHPL_TASKS [CHPL_QTHREADS]
memory CHPL_MEM CHPL_JEMALLOC
big integers [CHPL_BIGINT] CHPL_GMP
hw topology [CHPL_TOPO] CHPL_HWLOC

I view things in the second column as typically naming a backing implementation: "how should this feature area be implemented?", where the things in the third column typically say where that implementation comes from (is it bundled or part of the system or ...).

So to me, :+1: represents a row like:

concept high-level control low-level control(s)
regex CHPL_REGEX CHPL_RE2

where :heart: represents a row like:

concept high-level control low-level control(s)
regex [CHPL_REGEX] CHPL_RE2

and :tada: represents a row like:

concept high-level control low-level control(s)
regex CHPL_REGEX [CHPL_RE2]

And in case it's not clear, I'm on-board with proceeding with any of these three options (I've eventually voted all three of them) because I feel as though all of them are compatible with what we have today and could be evolved into :+1: if/when additional options are needed. That said, :heart: was my first choice (as indicated by this issue's name) since it seems most similar to the GMP case to me (we only have one implementation today; its presence or absence controls the feature) and seems simpler to implement than :+1: (which, again, I consider the end-all/be-all). :tada: is also OK with me for similar reasons, though it's not where my head went initially.

Where I'm stymied is that :rocket: has as many votes as any of these three options, yet to me doesn't fit the pattern we've established. I view it as creating a 2nd-column entry whose values are like the third column, and which doesn't seem particularly amenable to converting into :+1: over time. So while I feel like we've put a lot of effort into standardizing things recently and making them more uniform, (e.g., David Iten's bundled push), :rocket: seems like a case that thwarts that trend without a rationale that "sticks" for me.

gbtitus commented 3 years ago

~Maybe πŸš€ has so many votes because in your description of it you said "πŸš€ [...] can be considered as combining the best of ❀️ and πŸŽ‰ "!~

Edit: Removing this; because it's completely incorrect due to the fact that I failed to read the edit history of what I was commenting upon.

bradcray commented 3 years ago

It wasn't be that wrote that, and the votes were there before that edit was made.

[edit: though it's dropped in votes since my previous comment]

gbtitus commented 3 years ago

It wasn't be that wrote that, and the votes were there before that edit was made.

Yes -- many apologies for that, Brad.

bradcray commented 3 years ago

OK, so at this point, :rocket: has fallen behind, :+1: has not garnered much support, and :tada: and :heart: are close enough to suggest using a coin flip or an executive decision. Before resorting to either of those extreme measures, would any of:

leekillough commented 3 years ago

I am for πŸŽ‰ . It's much easier to implement, and is less disruptive of a change.

mppf commented 3 years ago

The reason that I like :heart: is that if we move to some other system to support the regular expressions (besides RE2) then that variable simply becomes irrelevant. You could keep setting CHPL_RE2=bundled or CHPL_RE2=none forever and we'd just ignore it and it'd have no effect.

Vs with :tada: if users put CHPL_REGEX=re2 into their profile/bashrc or whatever then it is something we would have to deprecate if we want to remove the RE2 implementation and replace it with something else. Furthermore they might put CHPL_REGEX=none today (if they can't build RE2, say) but they should presumably revisit that setting if we switch from RE2. If we chose to deprecate it (say, hypothetically speaking, if we had a Chapel-native regexp library) then that's extra noise. If we chose not to deprecate it, then these users will have made a choice about this setting that was probably connected to RE2 but it's still applying.

ben-albrecht commented 3 years ago

I would be fine with both πŸŽ‰ or ❀️ , but I think I now have a slight preference towards ❀️ . In addition to prior arguments, I like that ❀️ reserves CHPL_REGEX for the future when we have stabilized our regex library and choice of backend. I also like that it uses the bundled value like all the other third-party chapel env vars.

gbtitus commented 3 years ago

Based on an assumption that we'll eventually want to be at πŸ‘ , I view both πŸŽ‰ and ❀️ as reasonable ways to take us partway down that path, to a point where completing the transition won't require an additional deprecation step. However, recent conversations have actually brought me around to favoring ❀️ (surprise!) because it's similar to what we do in a couple of other cases already (CHPL_GMP, e.g.).

~I would adjust my reactions on the top comment, but in a cursory web search didn't find a way to do that.~ I have adjusted my vote in the top comment. And thank you, @ben-albrecht, for pointing out the (embarrassingly obvious) way to do that!

e-kayrakli commented 3 years ago

I am OK with :heart:.

When I first looked at the alternatives I was against it on the grounds that it was the biggest change for something we are not certain about. And :tada: is the opposite, in the sense that it is the "smallest" change and the most obvious alternative.

But it is a change, nonetheless. Reading this discussion, I can see that :heart: is more future-proof change and less likely to lead to another deprecation down the road.

I am casting another vote for :heart: while keeping the one for :tada: :)