Closed daviditen closed 3 years ago
Python has three strip() routines, including lstrip()
and rstrip()
, which makes programming a bit more racey.
What this issue is meaning to request is actually less like strip()
and more like textwrap.dedent()
.
I'm curious about the param
version -- I can't think how one would write a chapel function that takes a param string as input and returns a param string based on it like this. Given param p = "hello"
, I can't even write param q = p[3]
.
I'm curious about the param version -- I can't think how one would write a chapel function that takes a param string as input and returns a param string based on it like this.
It would require expanding what the compiler can do with param
strings. I don't think a pure Chapel param
version could be done right now without compiler changes.
I would really like to see this feature get added to the standard library. Multi-line strings are pretty ugly to use today because they require you to drop your indentation:
module M {
record R {
proc printHelp() {
writeln("""Multi-line strings
do not look great when you have
to drop your indentation like this""");
}
}
}
For the purposes of moving forward, I propose the following answers to the open design questions:
(1) Where should this method/function live?
It should live in String.chpl
. Python creates a TextWrap
module for this functionality, but most of the functions from that module feel like things that belong in a package module / mason package in the Chapel world. I would argue trim()
/ dedent()
belongs in the standard library though, because multi-line strings are significantly less useful without them (see motivation above).
@e-kayrakli mentioned that we may create a StringUtil.chpl
module at some point down the road (I can link the issue when I find it). It may make sense to migrate this function over to StringUtil
when it exists, but I also wouldn't want the design and development of StringUtil
to hold this feature up.
(2) Should this be a procedure or method?
A method seems most natural to me. It's easier to keep track of the open/closed parentheses, and you lose less horizontal white space for your multi-line string. Example below:
(3) What should this be called?
I would say trim()
or dedent()
. I slightly lean towards dedent()
because there is precedent for it in Python. It is a pretty arbitrary decision so we could settle it via polling the dev team.
(4) How to handle param strings?
As @daviditen points out above, this depends on some new compiler features. This shouldn't hold up the introduction of this feature, though.
I would really like to see this feature get added to that standard library.
Do you want to take it on?
Do you want to take it on?
I am happy to take this on. I think the implementation is fairly straightforward (~40 lines of code in python, e.g.). The main blocker is reaching consensus on the design decisions.
Quick comments (to some extent discussed with Ben offline)
(1) Where should this method/function live?
String.chpl
is fine in the beginning, but I think this is borderline "package" functionality. If there is a StringUtil
in the future as a mason package, I'd be for moving it there.
Relevant discussion: https://github.com/chapel-lang/chapel/issues/15095#issuecomment-598712463 and to some extent my comment below
(2) Should this be a procedure or method?
Can't see a good motivation for a proc. I think it should be a method.
(3) What should this be called?
Strongly leaning towards dedent
or something else other than trim
. I used Java's trim
before and it was doing what our strip
is doing. So trim
would be confusing for someone with similar background
(4) How to handle param strings?
I don't think this is a prerequisite for this. It shouldn't be very difficult to support that in our compiler, either.
Going backwards:
(4) I agree the param case can be deferred; params will still work with a non-param version of the function, they just won't have the advantage of being processed at compile-time, so will incur execution-time overhead. That seems acceptable given the convenience-oriented nature of the function.
(3) I think dedent
is more descriptive than trim
(2) I tend to agree with method over standalone
(1) No strong preference, though a downside of putting it into String.chpl is that it makes it always-available / documented in a weird place. To me it feels more standard library-ish, possibly package-ish. I'd probably use it as an excuse to kick off StringUtil, though that'll raise the question of what else in String.chpl should be moved there (which is a good question to ask).
The main question I have about this is whether:
writeln("""hi
there
ben""".dedent());
works as expected or not, or whether the initial line has to have the same number of spaces as the subsequent ones to work properly. The Python example I'm looking at is unclear on that point. Also, what is the behavior of a case where each line has a differing number of spaces:
writeln("""hi
there
ben
albrecht""".dedent());
(1) Where should this method/function live?
String.chpl
is fine in the beginning, but I think this is borderline "package" functionality. If there is aStringUtil
in the future as a mason package, I'd be for moving it there.Relevant discussion: https://github.com/chapel-lang/chapel/issues/15095#issuecomment-598712463 and to some extent my comment below
Ah, I thought StringUtil
would be a standard library module.
I would prefer dedent
be made available in the standard library (whether that be through String.chpl
or something else), since multi-line strings is far less ergonomic without it.
(3) What should this be called?
Strongly leaning towards
dedent
or something else other thantrim
. I used Java'strim
before and it was doing what ourstrip
is doing. Sotrim
would be confusing for someone with similar background
Good point. I did not realize Java had a trim
method. I am in favor of dedent
as well then.
The main question I have about this is whether:
writeln("""hi there ben""".dedent());
works as expected or not, or whether the initial line has to have the same number of spaces as the subsequent ones to work properly. The Python example I'm looking at is unclear on that point.
In Python, dedent
doesn't do anything special for the first line, so the minimum indentation would be 0 in that example (from the first line), and you'd get this as the output:
hi
there
ben
That is why they suggest doing the following in the python documentation:
print(dedent("""\
hi
there
ben"""))
We could special-case the first line to give your example the expected output (all indentation removed). However, this would break the case where a user does want indentation only on the first line:
writeln("""
hi
there
brad!""".dedent());
User wants:
hi
there
brad!
Special-casing first line would make it:
hi
there
brad!
Maybe this could be controlled via an argument in dedent()
?
My intuition would be to only special-case the first line in the event that it did not start with a space.
(I'd also be open to an optional argument saying whether or not the first line should be special-cased or not, defaulting to "yes"; along those lines, I liked the proposal above to have an optional "how much space to dedent by" argument, where the default (of 0 maybe?) would essentially be "autodetect").
My intuition would be to only special-case the first line in the event that it did not start with a space.
I think that could work.
(I'd also be open to an optional argument saying whether or not the first line should be special-cased or not, defaulting to "yes"; along those lines, I liked the proposal above to have an optional "how much space to dedent by" argument, where the default (of 0 maybe?) would essentially be "autodetect").
I am also open to that.
Any ideas for the arg name? I'll propose level
for now (level of dedentation).
Here's the current straw-function signature:
proc string.dedent(level=0) { ... }
proc string.dedent(columns=0, ignoreFirst=true);
???
Iterating a bit further before we request input from the core dev team:
/* Removes indentation from a (multi-line) string and returns the resulting string.
:arg columns: Maximum number of columns of indentation to remove. If ``0``, the number of columns will be determined automatically from the line with the minimum indentation.
:arg ignoreFirst: If `true`, ignores the first line if the first line is not indented when automatically determining the number of columns to remove (when ``columns == 0``).
*/
proc string.dedent(columns=0, ignoreFirst=true) { }
Here are a few test cases I've whipped up to demonstrate some of the expected behaviors:
A few more questions that came up as I was writing this. I'll give some proposed answers as well:
(Q1) How should ignoreFirst=false
and columns!=0
interact?
Since ignoreFirst
impacts whether the first line is counted when inferring the number of columns to remove, I think ignoreFirst
should do nothing when columns!=0
. It is admittedly a little confusing because a user might expect ignoreFirst
to mean "don't de-indent the first line". Perhaps we want to emit a warning or error if a user changes ignoreFirst
from its default value and also sets columns != 0
.
(Q2) How should tab characters be treated?
Python's dedent does the following:
Note that tabs and spaces are both treated as whitespace, but they are not equal: the lines " hello" and "\thello" are considered to have no common leading whitespace.
I toyed around with this behavior in Python's dedent, and agree with their take on this. This seems much more reasonable than trying to assign a whitespace value to tabs, e.g. 1 tab = 2 whitespaces.
I would've expected this case:
{
const s = """ 1
2
3""".dedent(columns=10);
writeln(s);
to have given
1
2
3
due to the columns=10 (which I interpret as "get rid of (up to) 10 columns of initial spaces).
For this:
// Lines with only whitespace are ignored and treated as a single newline
// character. Note there is trailing whitespace in this example:
is there precedent for this? It seems odd to me. Like, why would I have put a blank line with 10 spaces if I didn't want a blank line in there?
How should ignoreFirst=false and columns!=0 interact?
I think of ignoreFirst
as meaning "do no processing of the first line", so I'd expect ignoreFirst=true and columns = 3
for " hi"
to give " hi"
and ignoreFirst=false and columns = 3
to give "hi"
.
(Q2) How should tab characters be treated?
I don't disagree with your conclusion, but it gives me some pause as to whether "columns" is the right choice for the argument anymore...
due to the columns=10 (which I interpret as "get rid of (up to) 10 columns of initial spaces).
If we take your interpretation of ignoreFirst
below, wouldn't the current output be correct? That is, the first line should not be processed by default, and therefore no indentation should be removed. (I realize I've contradicted myself in this example, and that's a think-o on my part).
is there precedent for this? It seems odd to me. Like, why would I have put a blank line with 10 spaces if I didn't want a blank line in there?
This is how Python's dedent works. If these blank lines weren't ignored, should they interpreted as levels of indentation when inferring the level indentation to remove? I suspect Python chose to ignore these, to avoid this potential confusion for users.
I don't disagree with your conclusion, but it gives me some pause as to whether "columns" is the right choice for the argument anymore...
True. Throwing out a few alternative ideas:
margin
whitespaces
spaces
level
n
wouldn't the current output be correct? That is, the first line should not be processed by default, and therefore no indentation should be removed. (I realize I've contradicted myself in this example, and that's a think-o on my part).
I ate up one space that I shouldn't have (and have edited my comment to restore it), but it seems like your output ignored not just the first, but the second and third lines as well (i.e., they are still intended 10 columns).
Maybe a related question: I would have expected:
"""a
b
c
d
e
""".dedent(columns=5)
to give:
a
b
c
d
e
(i.e., that columns gives a maximum, not required number of whitespace characters to gobble up). But this may not agree with the "common leading whitespace" viewpoint of Python (because it might also suggest having columns=1
gobble up one space or one tab or ... Unless the "common leading whitespace" only fires when columns=0
and we're in "auto-detect" mode.
If these blank lines weren't ignored, should they interpreted as levels of indentation when inferring the level indentation to remove?
Doesn't the inference only occur based on the initial line? (in which case, only a blank initial line would affect the level of indentation which seems acceptable... or we could change ignoreFirst
to be ignoreInitial
where it would ignore the first line and any subsequent blank lines). Or does Python try to examine all the leading space of all the lines and make a "smart" choice about how much to dedent by?
Python try to examine all the leading space of all the lines and make a "smart" choice about how much to dedent by?
This is what Python's dedent does - find the minimum matching indentation (accounting for both tabs and spaces), and dedent by that much.
Here's my mental algorithm (which may or may not be appropriate / match Python's):
\n
columns=0
ignoreFirst/ignoreInitial=false
, this will be the first lineignoreFirst=true
, this will be the the second lineignoreInitial=true
, this will be the first non-blank line after the first\n
character, and use this as the whitespace-to-be-stripped pattern, but this seems more complex to me)ignoreFirst/ignoreInitial=true
, skip over the first / initial all-whitespace linesignoreFirst/ignoreInitial=false
, use all linescolumns
whitespace characters from it, if they exist (if they don't, stop early)This is what Python's dedent does - find the minimum matching indentation (accounting for both tabs and spaces), and dedent by that much.
Ah, OK. In that case, my "find the model line" step above would need to be "find the set of model lines" (defined as the model line and all subsequent lines) and the "count the number of leading whitespace characters" step would need to be a "find the minimum number of leading whitespace characters across all these lines" step.
Given that definition, I think I agree that an all-whitespace line wouldn't contribute to the minimum leading whitespace computation, but I still don't think it should be dropped from the output; I think its \n
should be preserved, at the very least.
Given that definition, I think I agree that an all-whitespace line wouldn't contribute to the minimum leading whitespace computation, but I still don't think it should be dropped from the output; I think its \n should be preserved, at the very least.
I second this notion. Removing those newlines seems to defeat the purpose that dedent
should make multiline string formatting easier.
The more I think about it, the more I think the "scan all candidate lines" approach is best to handle cases like:
a
b
c
d
e
So here's an updated mental algorithm:
\n
columns=0
ignoreFirst=false
, this will be all lines containing non-whitespace charactersignoreFirst=true
, this will be all lines other than the first containing non-whitespace characterscolumns
to the minimum number of leading whitespace characters in any of the model linescolumns
value:
ignoreFirst=true
, this is all lines but the firstignoreFirst=false
, this is all linescolumns
whitespace characters from it, stopping at the first \n
or non-whitespace characterAn implication of this is that:
"""a
b
\t\tc
\t d
\te
""".dedent()
would produce:
a
b
c
d
e
(i.e., tabs and spaces would be treated interchangeably). I don't have any insights into how to treat them differently and reasonably / portably.
Given that definition, I think I agree that an all-whitespace line wouldn't contribute to the minimum leading whitespace computation, but I still don't think it should be dropped from the output; I think its \n should be preserved, at the very least.
I second this notion. Removing those newlines seems to defeat the purpose that dedent should make multiline string formatting easier
I think we've come back to the original proposed behavior, which matches that of python: lines with only whitespace are treated as a single newline and ignored w.r.t. determining level of indentation.
(i.e., tabs and spaces would be treated interchangeably). I don't have any insights into how to treat them differently and reasonably / portably.
In the Python implementation, they don't treat tabs and spaces interchangeably. They find the longest common string of tabs and spaces and use that as the string to remove from each line. I think I prefer this behavior, because treating '\t'
and ' '
interchangeably is essentially treating tabs as a single whitespace.
If we went with that behavior, your example above would have no common string of tabs and spaces so the dedent would essentially be a no-op:
"""a
b
\t\tc
\t d
\te
""".dedent()
would produce:
a
b
\t\tc
\t d
\te
I think we've come back to the original proposed behavior
I disagree, or else your example has a bug. My concern wasn't that a whitespace-only line would be ignored when determining the dedent level so much as that your output dropped the whitespace-only line from the output (i.e., the blank line between 1 and 2 went away in the output; this seems extremely counterintuitive and undesirable to me).
or else your example has a bug
Looking back at it - my example had a bug. The blank line between 1 and 2 should still be there. Sorry for the confusion!
Let me put these examples into a Gist that I will keep updated as the discussion evolves.
Let me put these examples into a Gist that I will keep updated as the discussion evolves.
Or put them in tests/futures as a PR since that would be easy to comment on.
Created a draft PR here for discussion: https://github.com/chapel-lang/chapel/pull/15664
See if you agree with those cases (I'm sure I've made a mistake somewhere 😄 ), or if there are other cases I should include.
It should live in
String.chpl
. Python creates aTextWrap
module for this functionality, but most of the functions from that module feel like things that belong in a package module / mason package in the Chapel world. I would arguetrim()
/dedent()
belongs in the standard library though, because multi-line strings are significantly less useful without them (see motivation above).
After implementing this in a standalone file and migrating it to the String module, I noticed (1) the String module does not directly use the Regexp module anywhere else and (2) using the Regexp module in the String module introduces a bunch of errors in ByteBufferHelpers
(likely due to the module-level code of Regexp).
Before working through the Regexp use errors, I wanted to raise the question: Is it OK for a String method to utilize Regexp? If not, should we consider putting this in a different module or try to develop a non-regexp implementation?
The regular expressions used are:
// Replace all lines containing only white space (spaces or tabs) with empty string
const reWhitespaceOnly = compile('''(?m:^[ \t]+$)''');
var text = reWhitespaceOnly.sub('', this);
// Match all indentations (leading white space)
const reLeadingWhitespace = compile('''(?m:^[ \t]+)''');
var indents = reLeadingWhitespace.matches(text);
Is it OK for a String method to utilize Regexp?
I've generally been trying to fight against the tide of having "hello world" compile everything (and work towards reversing that trend), so having String.chpl (which is included by default, right?) automatically bring in Regexp seems like a step in the wrong direction to me. (Also, isn't Regexp something that isn't guaranteed to be available in all builds of Chapel? So it seems like things might break badly in a CHPL_REGEXP=none build?).
I guess my next questions would be:
I'll also be curious what @e-kayrakli thinks about this when he's back in the office, of course.
I've generally been trying to fight against the tide of having "hello world" compile everything (and work towards reversing that trend), so having String.chpl (which is included by default, right?)
Agreed.
is implementing this without using Regexp particularly difficult? (I haven't thought about it in awhile)?
Not terribly difficult AFAICT, but probably not nearly as computational efficient as Regexp without some optimization effort - which may be acceptable for now.
would it be weird to have it in a separate library module rather than being auto-available? (e.g., StringUtil.chpl, not that I like that name)
I'd be open to this idea, if we intended to add more features to this module in the future (@e-kayrakli mentioned there was plans for adding other string utils). Maybe this could even be a String submodule?
I'll also be curious what @e-kayrakli thinks about this when he's back in the office, of course.
Agreed.
but probably not nearly as computational efficient as Regexp without some optimization effort - which may be acceptable for now.
I wouldn't think that this is a routine whose performance we'd need to worry about too much. That said, it also seems plausible to me that the simple (and typically short) patterns that I think we care most about for this routine may actually be faster to do without leveraging a general-purpose regular expression evaluator (?).
Maybe this could even be a String submodule?
I don't think that would fix the dependence issue, as I don't think we have any form of conditional compilation for sub-modules. Also, since the user isn't meant to have any notion that there is a String
module (since it's part of the language), I'm not sure how we'd present such a submodule to them.
Based on this exchange, I think I'd implement it without regexp and just keep it in String.chpl for simplicity, but I think Engin's back tomorrow?
Based on this exchange, I think I'd implement it without regexp and just keep it in String.chpl for simplicity, but I think Engin's back tomorrow?
👍 Already working on it. We'll see what @e-kayrakli says when he's back. I can do some sanity check to see how comparable performance is with Regexp too.
Thanks for the input.
I've generally been trying to fight against the tide of having "hello world" compile everything (and work towards reversing that trend), so having String.chpl (which is included by default, right?) automatically bring in Regexp seems like a step in the wrong direction to me. (Also, isn't Regexp something that isn't guaranteed to be available in all builds of Chapel? So it seems like things might break badly in a CHPL_REGEXP=none build?).
Completely agree with this. I am against using standard modules in internals and against using package modules in standard modules as much as possible philosophically. But using Regexp is even worse because of its dependency.
I still think that this function should reside outside of the String module. Glancing over the past discussion here, I think we can put it in a StringUtils module or something. Having something like that in a Standard module can take some more time and thinking, so, maybe add this in a package module StringUtils, then we think more about what to add and move that to the standard modules? I'd be fine to just adding it as an unstable standard module, too.
(Very mildly related: A similar discussion about string utilities came up as part of adding a string.findAll
: https://github.com/chapel-lang/chapel/issues/15095. However, I think that particular function can be part of the string type)
I can do some sanity check to see how comparable performance is with Regexp too.
Here are some crudely collected timings (100k trials on a small test set) of some regexp-free implementations I threw together. There's definitely room for improvement, but I'd be OK proceeding with the less performant regexp-free implementations for now.
_ | regexp | no regexp |
---|---|---|
whitespace removal | 0.131741 | 0.24877 |
leading whitespace matching | 0.225953 | 0.888304 |
I still think that this function should reside outside of the String module.
@e-kayrakli - earlier you said:
String.chpl
is fine in the beginning, but I think this is borderline "package" functionality. If there is aStringUtil
in the future as a mason package, I'd be for moving it there.
which made me think we had a consensus on putting it inString.chpl
for now. Do you oppose putting it in String, even if the implementation doesn't rely on regexp?
I am OK not putting it in String.chpl
. I do think it should be available by default (via standard library or package module) because the language feature of multiline strings is significantly more useful with this feature.
which made me think we had a consensus on putting it inString.chpl for now. Do you oppose putting it in String, even if the implementation doesn't rely on regexp?
I am OK not putting it in String.chpl. I do think it should be available by default (via standard library or package module) because the language feature of multiline strings is significantly more useful with this feature.
Right. I do not think much more differently at a high level, however, putting it in String worries me more than before because it would be a breaking change if we were to remove it later. Whether we see the interface in String.chpl as part of language (relatively stable) or standard library (relatively unstable) is a factor here.
If we put it in String.chpl with this understanding at least, I'd be more comfortable. And I don't know what that means exactly... a warning with --warn-unstable
? A warning in docs? Or just a consensus that we want it to be in String.chpl for good? (If we are all for the last option, I don't feel strongly enough to argue further against it myself :) )
I feel generally inclined to make this "standard" / auto-available with strings since:
I'm open to arguments that it's somehow different from other string methods we currently support by default (e.g., startsWith(), endsWith(), strip()), I'm just not able to formulate those arguments myself.
Maybe my reluctance is coming from not using multiline strings that much. My concerns are a bit philosophical rather than concrete.
I see it different from strip, because it feels more like an advanced text processing (remove some whitespaces at some parts of the string but not others) rather than a fundamental operation that we may use for every string.
Another perspective that makes me question the need for this in String.chpl is because this is a functionality that would allow us to adjust whitespace while writing multiline strings which is a cosmetic concern that should be external to the implementation of a primitive type in a whitespace-agnostic language.
@e-kayrakli && @bradcray -- you both make good points.
Ultimately, what's important to me is that dedent
is available in standard/package modules.
Some options we could consider in order to move forward for now:
1) Add dedent()
to String.chpl
, but make it emit a warning for --warn-unstable
until we have had more time to live with it and determine if it's a good fit or not.
2) Use this as an opportunity to start a StringUtils
(or whatever we want to call it) package module that contains this secondary string method. Maybe this gets inducted into the standard library in the future.
Any opinions or other options worth considering?
Option 1 seems inherently less impactful to me in that we warn of something known to be unstable (like other such things) and don't have the overhead of creating a new one-routine module and documentation for it that might go away later.
OK, proceeding with option 1 for now. Thanks for the input.
Here are a few design elements that came up while implementing, and the answers I chose for the current implementation:
dedent(columns > 0)
consider tabs as whitespace?
dedent(columns > 0)
leave lines with only whitespace alone?
columns == 0
behaviorAlso, while writing tests, I realized an advantage of Python's decision to replace whitespace-only lines with empty strings. Putting the closing triple-quote on a new line and keeping it aligned with the opening triple quote will generate some trailing whitespace in the output if we leave whitespace-only lines untouched, e.g.
var s = """
a
b
c
""".dedent();
writeln(s);
output:
a
b
c
# <-- trailing whitespace here
Not a huge deal, but slightly annoying IMO.
Interested in thoughts here or on the PR (https://github.com/chapel-lang/chapel/pull/15664).
if we leave whitespace-only lines untouched
To clarify, are we always leaving them untouched, or are we only leaving them untouched if they don't have the same leading prefix that we're stripping from all other lines? That is, in your example:
var s = """
a
b
c
""".dedent();
writeln(s);
If that last blank line had been:
""".dedent();
I would've expected the prefix to be stripped off like any other line since it shares the same prefix. I tried scanning this issue to refresh my memory of what led us here, but didn't anything suggesting ignoring them completely (but was looking quickly).
I tried scanning this issue to refresh my memory of what led us here, but didn't anything suggesting ignoring them completely (but was looking quickly).
Looking back, I think I confused your request for not doing what Python does as leave them untouched completely. I think what you're actually suggesting makes a lot more sense. I can switch the behavior pretty easily.
Also, that answers question (2) above:
- Should dedent(columns > 0) leave lines with only whitespace alone?
No, dedent(columns > 0)
should remove up to columns
spaces from lines containing only whitespace.
OK, cool. Though note that (I think?) your most recent example would still be unchanged even under that interpretation (right?) in that we'd choose the a, b, c lines' whitespace as the amount to strip but the final line doesn't have that much (?). Or would we choose the final line's whitespace as the amount to strip which would leave some space remaining before the a, b, c lines?
If that's right then it makes me wonder "What is the normal way of indenting a """
that's on its own line?" If you're in a Python-savvy editor, does it do any auto-indentation of a closing triple quote? (emacs is either not indenting it or I've got some other problem in my sample that's throwing it off...)
OK, cool. Though note that (I think?) your most recent example would still be unchanged even under that interpretation (right?) in that we'd choose the a, b, c lines' whitespace as the amount to strip but the final line doesn't have that much (?).
Yeah that's right. 😞
Or would we choose the final line's whitespace as the amount to strip which would leave some space remaining before the a, b, c lines?
I don't think so. I think this would introduce more problems / annoyances.
If that's right then it makes me wonder "What is the normal way of indenting a """ that's on its own line?" If you're in a Python-savvy editor, does it do any auto-indentation of a closing triple quote? (emacs is either not indenting it or I've got some other problem in my sample that's throwing it off...)
I can't find a good source on the preferred style, and I don't see any fancy tricks from any editors I have tried. However, I will note that doc strings are supposed to close the triple quote on a newline:
def complex(real=0.0, imag=0.0):
"""Form a complex number.
Keyword arguments:
real -- the real part (default 0.0)
imag -- the imaginary part (default 0.0)
"""
source: PEP 257
I think it'd be nice to support both styles in Chapel.
At the cost of a little more complexity in the rules, we could find the matching common leading whitespace (margin) across all non-empty lines, and then remove that margin from all lines (still respecting ignoreFirst
argument though), including empty lines. For empty lines that do not have the full margin or don't entirely match due to tab/space combinations, we could only remove the matching portion of margin.
var s1 = """
a // <-- 2 leading spaces
// <-- 1 leading space, 1 space will get removed
""";
var s2 = """
a # <-- 2 leading spaces
# <-- 2 leading spaces, 2 spaces will get removed
""";
var s3 = """
a # <-- 2 leading spaces
# <-- 3 leading spaces, 2 spaces will get removed, 1 space will remain
""";
var s4 = """
a # <-- 2 leading spaces
\t# <-- 1 leading space followed by 1 tab, 1 space will be removed and 1 tab will remain
""";
// All leading whitespace will be removed from this example now:
var s5 = """
a
b
""";
// All leading whitespace will be removed from this example now:
var s6 = """
a
b
""";
I'm open to that proposal. Though your latest Python example also gives me less concern than your previous one since the amount of blank space in the final blank line is the same as on the previous ones (i.e., as far as I'm concerned, we could also stick to the simpler interpretation of only eating the whitespace if it's a full prefix match).
Issue #7867 discussed the possibility of automatically removing leading whitespace from multi-line strings, but we think that it would be better (at least for non-interpreted strings) to provide library functions to do this.
Would create strings:
The name
trim
is an idea for a name, not a decision that has been made. This method could also take an argument to indicate how much leading space to remove. It would be good to have a version that operates on and returns aparam
string so that thes2
case above is supported.