Closed bradcray closed 3 years ago
I think @vasslitvinov asked a question which either proposed this idea or suggested it to me (not sure which). Tagging @leekillough on this since he's been running with these topics and had already started looking into making CHPL_REGEXP into CHPL_REGEX.
I'd prefer having it be CHPL_REGEX
rather than CHPL_RE2
, and I think doing so makes it more obvious to the user that they need it when using the module. If we're going to rename the environment variable, I'd rather it be kept in line with the module name unless we plan on having the Regex module call out to different implementations under the covers (and in that case I'd prefer it to affect the value of the environment variable anyways).
For the sake of argument:
Variables that take multiple settings, some of which correspond to third-party packages:
Variables that take bundled vs. none settings, some of which correspond to third-party packages:
To me, this choice seems more like the second category than the first.
Note that most users never need to set CHPL_RE*
since it defaults to using the bundled version as long as it built properly.
I would argue that the difference is that regular expressions are a more commonly known language feature than any of the other options in your second section
I would argue that the difference is that regular expressions are a more commonly known language feature than any of the other options in your second section
What does that matter, though? (given that most users will never have a need to set the variable).
To me:
@vasslitvinov: I agree with you (but am not sure whether you're agreeing with me or not). If we expected to support (or be able to support) multiple Regex implementations for the standard Regex feature/module, then I agree it should be CHPL_REGEX. But I believe our implementation is fairly hard-coded to RE2 and that the nature of regex libraries each having their own syntax would make it hard to have multiple options available as we do for CHPL_COMM, CHPL_TASKS, CHPL_MEMβat least without changes to user code (which we don't require for those other options). This is what makes me think this choice is more like GMP, LLVM, etc.
If we indeed tie the Regex module to RE2, then I can see that CHPL_RE2 makes more sense. CHPL_REGEX would be a binary flag (none or re2) vs. CHPL_RE2 could have more possible values.
This is what makes me think this choice is more like GMP, LLVM, etc.
Of these, only GMP is also a module. And GMP matches its module name. We seem to think that Regex is a better name for the module than RE2 despite the fact that it is fairly tied to the RE2 implementation (which I agree with). My position is based on the argument that Regex is a better name for the module and the argument that consistency is beneficial. It seems we disagree on what it should be consistent with, not whether consistency is important.
I think arguing that "the user won't have to set it very often" undercuts the value of having it be consistent with other environment variables - if setting the environment variable is unimportant to the user, then why should we prioritize consistency with other environment variables in similar situations over consistency with what the user will interact with more? In the situations where the user does want to set the value explicitly, it will be because they found they couldn't use the module they care about, so making it easier to go from "I can't use the Regex module" to "oh, my environment variable is set incorrectly" should be the priority, in my opinion.
Are there scenarios that I am not thinking of where having the value not be bundled matters? If so, maybe CHPL_REGEX_RE2
is a better name for the environment variable - this would handle my concerns about the user not recognizing the connection to the module name while still being sufficiently tied to RE2 itself that bundled
is appropriate.
It seems we disagree on what it should be consistent with, not whether consistency is important.
I agree with that.
Of these, only GMP is also a module. And GMP matches its module name.
That doesn't carry a lot of weight with me, in part because GMP isn't a module that we'd encourage most users to use in practice. At this point it's more like the underlying (non-Chapeltastic) implementation strategy for the "BigInt" module, in much the same way that RE2 is the underlying implementation of Regex. The difference is that we chose to expose the low-level GMP interface to the user in the "BigInt" case, but not to expose the RE2 interface to the user in the Regex case. I'd say that the environment variable name describes the third-party library name, which happens to match the user-facing module name rather than the other way around.
if setting the environment variable is unimportant to the user, then why should we prioritize consistency with other environment variables in similar situations over consistency with what the user will interact with more?
Because it offers consistency for developers who seem to me to likely be the main people who have to interact with this variable (through scripting, documentation, their testing environments).
Are there scenarios that I am not thinking of where having the value not be bundled matters?
I think it's plausible that we could support a system
installation of re2 (potentially with restricted functionality since it wouldn't have Michael's patches unless we successfully get those merged upstream, which would be attractive for other reasons). But it's not something we're likely to prioritize anytime soon. My dissatisfaction with the current situation relates to the fact that there aren't siblings of re2
that I can imagine specifying as alternatives for CHPL_REGEX, so it's essentially a boolean variable whose values seem misleading and inconsistent with other environment variables. If we did have multiple implementations, I'd expect using CHPL_REGEX to switch between these, and then variables like CHPL_RE2, CHPL_PCRE to say where the implementations should come from, similar to how CHPL_MEM can be set to jemalloc
and then CHPL_JEMALLOC can be set to none
vs. system
vs. bundled
.
Now, we could imagine supporting both CHPL_REGEX = none | re2
and CHPL_RE2 = none | bundled
as a compromise solution. But to me, this adds complexity without value since it uses two variables to express what is currently a binary decision, one of which isn't likely to support additional values over time.
To further illustrate your position, would you outline a scenario that would motivate a user to set the environment variable, explain how they would go about learning that they should set it, and explain why having it match the module name would simplify that experience for them?
To further illustrate your position, would you outline a scenario that would motivate a user to set the environment variable, explain how they would go about learning that they should set it, and explain why having it match the module name would simplify that experience for them?
A user starts with the quickstart settings (which explicitly sets CHPL_REGEXP=none
today) as a way of getting up and running with Chapel. They're experimenting, they're playing around - they haven't gotten to the phase where they feel like turning everything on yet and using the normal setchplenv script. Maybe they're looking at libraries to see what they can do, maybe they just have a toy regular expression program that they like to play with. They try to use the Regex library and get the error message about having the environment variable set to none. In the case where the environment variable matches the module name they think "of course" and just set the environment variable. In the case where the environment variable is CHPL_RE2, now they're thinking about underlying implementation details, are maybe confused about why this setting matters, but probably still set it. If they'd found printchplenv
output, maybe they would have thought to explicitly set CHPL_REGEX before experimenting with the library and never encountered the error in the first place (whereas they are unlikely to think of that for CHPL_RE2).
At the end of the day, it's not a major problem and not super important to me, I'm mostly just offering my two cents because I had an opinion and you asked for opinions. At any rate, thanks for caring about where the opinion came from, even if at the end of the day we don't end up following it.
I prefer CHPL_REGEX
instead of CHPL_RE2
.
But whatever we change CHPL_REGEXP
to, it looks like there will be a lot of difficulty changing it if we want to preserve backward-compatibility, which is why CHPL_REGEXP
was not changed in #17276.
In the case where the environment variable is CHPL_RE2
So to make sure I'm not missing anything: it's not as though they'll have any troubles setting the variable or determining that they need to set it, it's just that the name doesn't match the module which might cause them to puzzle over it.
now they're thinking about underlying implementation details
I'm not sure I completely buy that the variable name is going to be the thing that causes them to think about this. The first sentence of the Regex[p] module's documentation states the strong tie to RE2, and points to the RE2 documentation as the official reference for the syntax: https://chapel-lang.org/docs/master/modules/standard/Regexp.html
it looks like there will be a lot of difficulty changing it
I agree that renaming an environment variable isn't the most fun change to make, which is why I wanted to discuss what we should be changing it to before too much effort was exerted there (to avoid having to change it twice). But when you say "difficulty", is it just a lot of detail to slog through, or are there deep challenges? I think @daviditen had some recent experience with changing and deprecating environment variables with the bundled
effort (if not others?), so may have some best practices to share.
I think CHPL_REGEX
sounds better, even if we change its allowed values to be either none
or bundled
, if we want to de-emphasize the RE2 implementation, and allow the bundled
option to change to mean something else later.
CHPL_REGEX=bundled
means "I want the bundled Regex implementation, whatever that's based on."
CHPL_RE2
emphasizes that it's based on RE2, and we have the choice of either the bundled
RE2 or none
.
The name CHPL_RE2
strays away from the Regex
module name, and it might not be clear to the user what CHPL_RE2
means, since it's just choosing an under-the-covers Regex implementation.
I voted for CHPL_RE2 = bundled | none
. That seems consistent with what we do for other third-party packages (GMP
, HWLOC
, LLVM
all come to mind) where each is a specific software entity and not a general feature.
We could imagine a world where setting CHPL_RE2=none
causes us to default to an implementation of regex written entirely in Chapel code. Viewed from that lens, that we depend on re2
to use Regex
can be seen as a limitation of the current implementation.
Another option that makes some sense to me is to mimic the structure of CHPL_MEM
and say something like CHPL_REGEX = re2 | chapel
. Where we can use re2
as an accelerator like we do with jemalloc
, or we can fall back on the Chapel based implementation (where we'd just emit an error saying it's not currently supported, and to use re2
).
This only makes sense if it's easy to swap out our regex engine (I have no clue how difficult this is, it's probably not as easy as swapping our memory allocator). Our talks earlier made it sound like it was not trivial.
I agree with @leekillough's comment above. In particular I like CHPL_REGEX=bundled
because while we use RE2, it's not possible for somebody to provide their own RE2 (since we have custom patches to it). E.g. CHPL_RE2=system
does not make sense. Additionally I definitely do not think of RE2 as fundamental to regular expressions in Chapel. It is just how we implemented it. So I would prefer to hide that detail, if we can.
If in the future, we change to some other regex implementation (let's say it's called bogoregex
for the discussion), then I would expect it is used from the Regex
module instead of RE2. I do not think we would try to support both at once (the way we do with CHPL_COMM
say). I would further expect that we bundle it, so that CHPL_REGEX=bundled
continues to work. At that time, if it's possible to use a system-wide install of bogoregex
, I would advocate for making the adjustments to allow people to write CHPL_REGEX=bogoregex
and CHPL_BOGOREGEX=system
.
However this is not something I care deeply about.
The argument that carries the most weight (for me) in favor of CHPL_RE2=
is that maybe it makes the question of "what am I going to build when I type make
" more obviously answerable. However I'm not so sure that is a goal of printchplenv
.
Additionally I definitely do not think of RE2 as fundamental to regular expressions in Chapel. It is just how we implemented it. So I would prefer to hide that detail, if we can.
To me, this seems contrary to the notion of stabilizing our standard libraries. If we use something other than re2 to implement them, the regular expression syntax is likely to change, isn't it? Unless you're suggesting that in the future we have a different implementation than re2 that happens to implement the identical regex syntax as re2. (Is that likely?)
Imagining that it is for a moment, that actually suggests going the other way to me. That is CHPL_REGEX=re2
if we want to support regex with re2
vs. CHPL_REGEX=my_package_that_implements_re2
when we have an alternative? Where for someone selecting the re2 option, we'd then use CHPL_RE2
to indicate where it should come from (e.g., bundled). To me, that's the advantage of the :+1: approach, where I view the :heart: approach as a simpler stepping stone for the time being given that there's no real need for two environment variables currently.
it's not possible for somebody to provide their own RE2 (since we have custom patches to it).
Not currently, but it's conceivable that in the future (when we've taken over the world), we'd get the patches incorporated and be able to use a system version (I definitely wouldn't want to preclude our ability to do so since it'd be attractive for other reasonsβlike simplifying the process of upgrading re2 versions). Or that we'd stop relying on the patches for other reasons (stop needing the feature; re2 provides equivalent features that we can use).
If we use something other than re2 to implement them, the regular expression syntax is likely to change, isn't it?
Not necessarily. We can convert regular expression strings to handle minor differences. As I understand it, RE2 is relatively restricted in terms of what it supports, so I don't think we'll run into features we cannot express in another library's syntax. I would expect the minor differences to mostly come up with things like \s
and the names of character classes.
But, I am also uncertain to what degree we can stabilize the Regex module without at least answering the philosophical question of if we are defining our own regexp syntax or if we are providing an easy-to-use interface for an existing library. I don't recall if we have answered that already. But anyway all of this is arguably off-topic here and belongs in #17220.
We can convert regular expression strings to handle minor differences. As I understand it, RE2 is relatively restricted in terms of what it supports, so I don't think we'll run into features we cannot express in another library's syntax.
That argument suggests the :+1: approach to me then, rather than the :rocket: approach: A CHPL_REGEX variable that says what underlying technology is used to implement the Regex module (e.g., re2, pcre, etc.) and then, in the event that re2 is selected, a CHPL_RE2 variable saying where to find the implementation, as we do with other high-level technologies that have multiple implementations (CHPL_MEM=jemalloc, CHPL_JEMALLOC=bundled). I guess your counterargument is:
I do not think we would try to support both at once (the way we do with CHPL_COMM say).
But I don't see any inherent reason that this would be the case...
I definitely prefer :+1: to :heart:, because :heart: limits us to RE2 or nothing.
In decreasing order of preference:
:rocket: :tada: :+1: :heart:
I definitely prefer π to β€οΈ, because β€οΈ limits us to RE2 or nothing.
I don't think that's right, because we could do :heart: now and add the CHPL_REGEX variable later if/when we had a second package we wanted to support (which would essentially take us to :+1:). This is the reason I prefer :heart: : it's simpler than :+1: (one variable rather than two), seems most similar to what we do in other cases, and seems most future-proof since it permits us to switch to :+1: later by adding something without changing anything.
But locking ourselves to RE2 seems to be a step backward rather than forward, even if we can do :+1: later.
We're not locked to RE2 by taking the :heart: approach: If we later choose to also (or instead) accept CHPL_REGEX=pcre
(say), the CHPL_RE2
variable would simply be ignored in that case (much as CHPL_JEMALLOC
is if CHPL_MEM
isn't set to jemalloc
). And presumably at that time, a new CHPL_PCRE
variable would be checked to see which version of PCRE to use.
β€οΈ CHPL_RE2 has RE2 in the name, and hence locks us to RE2.
It would lock our "which version of RE2 should be used by this build of Chapel?" variable (CHPL_RE2
) to RE2, but it wouldn't lock our "What underlying regex technology should be used to implement the Regex module?" variable (CHPL_REGEX
) to RE2. This is analogous to how CHPL_MEM
says how memory should be implemented, and if jemalloc
is chosen as the option, CHPL_JEMALLOC
says where to find it (without locking the choice of memory implementation to jemalloc).
I believe that Chapel will eventually want to standardize a regular expression syntax, which won't be an exact match to that defined by RE2. I believe that means π is what we need for stabilization. I'm not too fond of any of the others, because they require an additional future change after users do the current CHPL_REGEXP
--> CHPL_REGEX
change. If we just say that when CHPL_REGEX=re2
, having CHPL_RE2
unset implies "bundled", are we basically at π ?
I believe that Chapel will eventually want to standardize a regular expression syntax, which won't be an exact match to that defined by RE2. I believe that means π is what we need for stabilization.
Hi Greg βΒ I used to think this too until Michael pointed out that our RegEx module could translate between the user's regular expression strings and the ones used by an underlying implementation in cases where it doesn't match, would allow us to target other implementations.
... our RegEx module could translate between the user's regular expression strings and the ones used by an underlying implementation ...
Ah, good point. That makes me wonder whether we even need CHPL_REGEX
then. Perhaps CHPL_RE2=bundled|none
(maybe adding system
someday) is enough. We would only need CHPL_REGEX=re2|somethingElse
if we had more than one underlying implementation, in which case the translation Michael and you described would need to know what that underlying implementation was. And the only reason I can think of that we'd want multiple implementations was if the performance profiles of the implementations weren't the same, i.e. if neither outperformed the other overall but rather each had things it was better for. I definitely wouldn't expect a functionality difference.
That makes me wonder whether we even need CHPL_REGEX then. Perhaps CHPL_RE2=bundled|none (maybe adding system someday) is enough.
That's why I voted for :heart: as my first choice.
In the post-meeting discussion today, the argument was brought up about approaches that embed "re2" into the variable name or its setting causing problems for users down the line if/when it changes away from re2, but I don't really buy this argument because my guess is that users don't set this variable 99% of the time (because it defaults to an appropriate setting when unset).
Part of what keeps me from voting for (π CHPL_REGEX = re2 | none && CHPL_RE2 = bundled | none) is that I really don't like the idea of us having a variable that when set to none
basically implies that we don't support regex anymore.
It's true that this is the case today, but I don't think we should advertise that because it's something that we recognize as a shortcoming and intend to fix in the future (by adopting our own regex language, by perhaps implementing a regex engine in Chapel, etc).
Basically, I don't really think we should consider REGEX
as something that the user can just "turn off".
I'd be willing to vote for π if "none" was changed to something like "native" or "chapel".
I think I'm back to π , because:
CHPL_REGEX
with for example CHPL_COMM
is nice,CHPL_RE2
entirely in the near term, andCHPL_RE2
it shouldn't be a breaking change.I don't think we should advertise that because it's something that we recognize as a shortcoming and intend to fix in the future
I don't view it that way, personally. Today, it's there primarily so that (a) if you don't want to spend your resources building re2, you can opt out of it (e.g., quickstart mode does this) and (b) if it causes any portability problems, you're not stuck with a broken version of Chapel by not being able to opt out of it. Those both seem like reasonable reasons to maintain a way of opting out of it to me.
I voted :tada: (CHPL_REGEX = re2 | none)
CHPL_RE2
, however, the fact that we have our custom implementation of it makes me feel a bit less certain about it. What if we decide to stick with this implementation and keep modifying it for our purposes? At some point it'll be "RE2 with bunch of custom Chapel-tailored patches", in which case this environment variable will look obsolete/outdated.CHPL_RE2
feels better to me if it is the secondary variable after CHPL_REGEX
. We can have CHPL_RE2=none|bundled|system
, where the system
alternative can only provide a smaller subset of functionality of our Regex module, for example. Though, Michael says system
option does not make sense. I am wondering whether we can refactor the module or the modifications to the re2 implementation, to change that. There's obvious value in using system's packages for our third party dependencies.CHPL_RE2
down the road (:+1:) if we decide to go that route, I think. I also do not see it as changing it twice if we go with :tada: and switch to anything else later on, because we should have changed to CHPL_REGEX
already if it wasn't for implementation challenges(?). And CHPL_REGEXP
is just something outdated that we are living with today.(a) if you don't want to spend your resources building re2, you can opt out of it (e.g., quickstart mode does this) and (b) if it causes any portability problems, you're not stuck with a broken version of Chapel by not being able to opt out of it
I agree that you shouldn't have to build re2
if you don't plan on using regex, but I'd be much happier if the other option was something like chapel
, and then if you tried to use Regex;
it would give you the error about "Chapel-based regex is not supported, please use CHPL_REGEX=re2 instead".
but I'd be much happier if the other option was something like chapel, and then if you tried to use Regex; it would give you the error about "Chapel-based regex is not supported, please use CHPL_REGEX=re2 instead".
I'm not aware of any plan (near-term or long-term) to build a native Chapel regex package, so to me, it seems odd to make the "opt-out of regex" option require opting into something that hasn't been discussed and seems unlikely to be built soon. We don't opt out of multi-locale runs by setting CHPL_COMM=chapel
, so why would we opt out of regex support differently? Even if a native-Chapel Regex library were introduced at some point in the future, we could still add a CHPL_REGEX=chapel
option at that point to opt into it without breaking anything.
I don't think we should advertise that because it's something that we recognize as a shortcoming
I don't view it as a shortcoming to be able to say "I don't want/need this feature, so please don't spend my time/space on it."
I'm not aware of any plan (near-term or long-term) to build a native Chapel regex package, so to me, it seems odd to make the "opt-out of regex" option require opting into something that hasn't been discussed and seems unlikely to be built soon.
I guess I thought differently, given that some people suggested interest in adopting a Chapel regex syntax. If we had a Chapel regex syntax, it doesn't seem so farfetched to say that the Chapel regex engine just isn't implemented yet.
We don't opt out of multi-locale runs by setting CHPL_COMM=chapel, so why would we opt out of regex support differently?
I guess CHPL_COMM=none
makes more sense to me because the language doesn't suddenly break if you decide to use an on
statement. (There might be other cases where the compiler just breaks if you try to do something distributed, feel free to correct me!)
I don't view it as a shortcoming to be able to say "I don't want/need this feature, so please don't spend my time/space on it."
I'd like to view CHPL_REGEX
as "select your regex backend", not as "turn regex on or off". In that light, selecting chapel
or native
makes sense even if it isn't implemented right at this moment...since it's native Chapel, you aren't linking in an extra library that you won't use...
I can't seem to resist reading CHPL_REGEX=none
as "turn regex off". I suppose you could interpret it as "don't use a Regex backend" (which could imply using a Chapel-based implementation at some point), but my brain just doesn't seem to want to do that.
I can't seem to resist reading CHPL_REGEX=none as "turn regex off". I suppose you could interpret it as "don't use a Regex backend" (which could imply using a Chapel-based implementation at some point), but my brain just doesn't seem to want to do that.
I also read it that way, and agree that we should not redefine none
to mean "use a Chapel-native version" in the future, but should use chapel
or native
for that. Even using your "select regex backend" interpretation (where I might say "select the technology that implements Regex" instead), none
seems accurate: "I don't want a regex backend, but thanks for offering."
CHPL_COMM=none makes more sense to me because the language doesn't suddenly break if you decide to use an on statement.
Regex
isn't part of the language, though, it's a standard library that is optional to use. Getting a compiler error saying "You've asked to use Regex
but don't have a regex back-end available" doesn't seem that much worse than getting an execution time error saying "You said to run on 2 locales, but didn't enable communication, so can't do that." (in fact, it seems better since it's compile-time rather than execution-time).
Consider some other key cases of using none
in our CHPL_ settings:
Now, granted, most of these are the lower-level packages rather than the higher-level feature, so if you don't have problems with them, that might suggest you might be open to a scheme in which:
Which is effectively the :heart: simplification of the :+1: proposal.
Which is effectively the β€οΈ simplification of the π proposal.
Yeah, I voted for CHPL_RE2=bundled|none
(the β€οΈ proposal). I like it because it doesn't talk about REGEX
at all, and in my mind sidesteps this issue of CHPL_REGEX=none
.
Part of my reason for this exchange was to see if I could convince myself to vote for π in the interest of moving things forward, but it seems like changing my vote leaves us no better off than before...
- CHPL_JEMALLOC = none: "Don't rely on jemalloc being available because it's not."
@bradcray Something of a nit, but I think your examples are good ones for the settings such as CHPL_GMP
that specify whether or not an optional third-party package is included (which was your point), but not for the ones such as CHPL_JEMALLOC
that specify where to find a package that definitely must be present. CHPL_JEMALLOC
is totally ignored if CHPL_MEM!=jemalloc
, but if it's set to "none" with CHPL_MEM==jemalloc
you'll get an error when you try to build the runtime. CHPL_LIBFABRIC
is similar. "none" isn't a meaningful setting for these.
I think part of what we're struggling here is that the REGEXP/REGEX/RE2 setting is currently of the CHPL_GMP
sort, and we're wondering whether to turn it into the CHPL_JEMALLOC
sort. Either way we're answering two questions:
Much of the tension between π and β€οΈ seems to be about whether we want to be able to answer these two separately, or if it's sufficient to only be able to answer them simultaneously.
"none" isn't a meaningful setting for these.
That's a reasonable point. But...
of the CHPL_GMP sort, and we're wondering whether to turn it into the CHPL_JEMALLOC sort
I don't see CHPL_GMP and CHPL_JEMALLOC as being particularly different from one another (nor from CHPL_COMM, for that matter). Here's what I think the unifying principle is for our current settings, where cases that don't exist (but could if/when we wanted or needed to add them without breaking anything) are in square brackets:
concept | high-level control | low-level control(s) |
---|---|---|
communication | CHPL_COMM | CHPL_LIBFABRIC |
[CHPL_GASNET] | ||
tasking | CHPL_TASKS | [CHPL_QTHREADS] |
memory | CHPL_MEM | CHPL_JEMALLOC |
big integers | [CHPL_BIGINT] | CHPL_GMP |
hw topology | [CHPL_TOPO] | CHPL_HWLOC |
I view things in the second column as typically naming a backing implementation: "how should this feature area be implemented?", where the things in the third column typically say where that implementation comes from (is it bundled or part of the system or ...).
So to me, :+1: represents a row like:
concept | high-level control | low-level control(s) |
---|---|---|
regex | CHPL_REGEX | CHPL_RE2 |
where :heart: represents a row like:
concept | high-level control | low-level control(s) |
---|---|---|
regex | [CHPL_REGEX] | CHPL_RE2 |
and :tada: represents a row like:
concept | high-level control | low-level control(s) |
---|---|---|
regex | CHPL_REGEX | [CHPL_RE2] |
And in case it's not clear, I'm on-board with proceeding with any of these three options (I've eventually voted all three of them) because I feel as though all of them are compatible with what we have today and could be evolved into :+1: if/when additional options are needed. That said, :heart: was my first choice (as indicated by this issue's name) since it seems most similar to the GMP case to me (we only have one implementation today; its presence or absence controls the feature) and seems simpler to implement than :+1: (which, again, I consider the end-all/be-all). :tada: is also OK with me for similar reasons, though it's not where my head went initially.
Where I'm stymied is that :rocket: has as many votes as any of these three options, yet to me doesn't fit the pattern we've established. I view it as creating a 2nd-column entry whose values are like the third column, and which doesn't seem particularly amenable to converting into :+1: over time. So while I feel like we've put a lot of effort into standardizing things recently and making them more uniform, (e.g., David Iten's bundled
push), :rocket: seems like a case that thwarts that trend without a rationale that "sticks" for me.
~Maybe π has so many votes because in your description of it you said "π [...] can be considered as combining the best of β€οΈ and π "!~
Edit: Removing this; because it's completely incorrect due to the fact that I failed to read the edit history of what I was commenting upon.
It wasn't be that wrote that, and the votes were there before that edit was made.
[edit: though it's dropped in votes since my previous comment]
It wasn't be that wrote that, and the votes were there before that edit was made.
Yes -- many apologies for that, Brad.
OK, so at this point, :rocket: has fallen behind, :+1: has not garnered much support, and :tada: and :heart: are close enough to suggest using a coin flip or an executive decision. Before resorting to either of those extreme measures, would any of:
I am for π . It's much easier to implement, and is less disruptive of a change.
The reason that I like :heart: is that if we move to some other system to support the regular expressions (besides RE2) then that variable simply becomes irrelevant. You could keep setting CHPL_RE2=bundled
or CHPL_RE2=none
forever and we'd just ignore it and it'd have no effect.
Vs with :tada: if users put CHPL_REGEX=re2
into their profile/bashrc or whatever then it is something we would have to deprecate if we want to remove the RE2 implementation and replace it with something else. Furthermore they might put CHPL_REGEX=none
today (if they can't build RE2, say) but they should presumably revisit that setting if we switch from RE2. If we chose to deprecate it (say, hypothetically speaking, if we had a Chapel-native regexp library) then that's extra noise. If we chose not to deprecate it, then these users will have made a choice about this setting that was probably connected to RE2 but it's still applying.
I would be fine with both π or β€οΈ , but I think I now have a slight preference towards β€οΈ . In addition to prior arguments, I like that β€οΈ reserves CHPL_REGEX
for the future when we have stabilized our regex library and choice of backend. I also like that it uses the bundled
value like all the other third-party chapel env vars.
Based on an assumption that we'll eventually want to be at π , I view both π and β€οΈ as reasonable ways to take us partway down that path, to a point where completing the transition won't require an additional deprecation step. However, recent conversations have actually brought me around to favoring β€οΈ (surprise!) because it's similar to what we do in a couple of other cases already (CHPL_GMP
, e.g.).
~I would adjust my reactions on the top comment, but in a cursory web search didn't find a way to do that.~ I have adjusted my vote in the top comment. And thank you, @ben-albrecht, for pointing out the (embarrassingly obvious) way to do that!
I am OK with :heart:.
When I first looked at the alternatives I was against it on the grounds that it was the biggest change for something we are not certain about. And :tada: is the opposite, in the sense that it is the "smallest" change and the most obvious alternative.
But it is a change, nonetheless. Reading this discussion, I can see that :heart: is more future-proof change and less likely to lead to another deprecation down the road.
I am casting another vote for :heart: while keeping the one for :tada: :)
This is somewhat related to issue #17220 and also to #17222: Assuming that our Regex module continues to be a standard (rather than package) module and based on RE2, should we should consider changing the
CHPL_REGEXP
environment variable toCHPL_RE2
and the values to bebundled
vs.none
, to be more similar to other third-party packages we rely upon? The currentCHPL_REGEXP
variable with valuesre2
andnone
suggests to me that one could plug in other packages rather thanre2
, yet I don't know that any of us expect that to be supported or supportable (?).Options discussed below: :heart: CHPL_RE2 = bundled | none π CHPL_REGEX = re2 | none π CHPL_REGEX = bundled | none :+1: CHPL_REGEX = re2 | none && CHPL_RE2 = bundled | none
Rationale for the options:
:+1: βΒ Brad argues that this is the most flexible approach in that:
CHPL_REGEX
says "what technology should be used to implement Regex?" where today's answers arere2
ornone
, but other technologies likepcre
, etc. could be added in the futureCHPL_RE2
says "if re2 was selected, where will we find it?", where today's answers arebundled
ornone
but ultimately we could supportsystem
if/when our patches are merged upstream or we stop relying on them β Future-proof β Similar to what we do for for other cases (e.g., CHPL_COMM/CHPL_GASNET) β Requires implementing two variables w/ 4 options to describe the 2 options we support today, so it's needless work (especially for scripts and makefiles):heart: β This could be considered a stepping stone toward :+1: that can be viewed as assuming/hardcoding
CHPL_REGEX=re2
β Only requires implementing one variable β Requires changing
CHPL_REGEXP
toCHPL_RE2
, hard-codingRE2
in the name, at least for some time(Brad's rebuttal: that choice of name is intentional / by design, so I don't see why this is a negative) β May be harder for users to transition from
CHPL_REGEXP
toCHPL_RE2
than toCHPL_REGEX
(Brad's rebuttal: users almost never set this variable in practice because it's correctly inferred for them).π β This could be considered a stepping stone toward :+1: that can be viewed as assuming/hardcoding
CHPL_RE2=bundled
β Only requires implementing one variable β Only requires renaming the environment variable
CHPL_REGEXP
toCHPL_REGEX
, with the rest of the code remaining mostly the same:rocket: β This can be considered as combining the best of :heart: and :tada: by renaming the environment variable name from
CHPL_REGEXP
toCHPL_REGEX
, but also allowing the valuesbundled
ornone
to be specified, but not disallowing specific packages likere2
orpcre
to be specified in the future, in which case which version to use (bundled
,system
) can be specified by a package-specific variable likeCHPL_RE2
in the future, which is similar to :+1:CHPL_RE2
or similar variables in the future, to specify which version ofRE2
to use β Allows for the first time a general Chapel feature macro likeCHPL_REGEX
to take on values likebundled
, when values likebundled
have historically only been allowed for specific software packagesRight now the
Regex
module is tied to a patched version ofRE2
, and so unless the interface ofRegex
can be flexible enough to handle unmodified versions ofRE2
,PCRE
, etc., it is likely that the only choices available will bebundled
(re2
) andnone
.