Closed quinnj closed 7 years ago
+1 for no infix operators at all. This subject attracts too much noise, and O(n) for a * b * c * d ...
concatenation isn't good.
If there is discussion about alternatives, then +100 for moving it to the julia-infix-operator-debates
mailing list.
+1 for no infix operators at all. This subject attracts too much noise, and O(n) for a * b * c * d ... concatenation isn't good.
+1 to that
If there is discussion about alternatives, then +100 for moving it to the julia-infix-operator-debates mailing list.
:laughing:
LOL. +1 on the julia-infix-operator-debates
.
(I'll personally feel sad to see this use of *
and ^
go...)
Stefan recently gave a nice, succinct explanation for why he wants to see them go, I'm going to quote it here:
My problem with * for string concatenation is not that people find it unexpected but that it's an inappropriate use of the * generic function, which is agreed upon to mean numerical multiplication. The argument that strings form a monoid is kind of thin since lots of things form a monoid and we're generally not using * for them. At the time I introduced * for strings, we were a lot less strict about operator punning – recall | and & for shell commands – we've gotten much stricter over time, which is a good thing. This is one of the last puns left in the standard library. The reason ++ would be better is not because it would be easier to learn (depends on where you're coming from), but because the ++ operator in Julia would unequivocally mean sequence concatenation.
Note that the operator punning is not an entirely academic concern. The Char corner case shows where the punning can cause problems: people might reasonably expect 'x' * 'y' to produce either "xy" or 241. Because of this, we just make both of these operations no method errors, but it would be perfectly reasonable to allow 'x' ++ 'y' to produce "xy". There's a lot less of a case for having 'x' * 'y' produce 241 or 'ñ', but the sequence concatenation operation does actually make sense.
References to prior discussions: https://groups.google.com/forum/#!msg/julia-dev/4K6S7tWnuEs/RF6x-f59IaoJ https://groups.google.com/d/msg/julia-users/nQg_d_n0t1Q/9PSt5aya5TsJ https://groups.google.com/d/msg/julia-users/JnTy-XcfLF8/JeeHREk2TvwJ JuliaLang/julia#1771 JuliaLang/julia#2301
I for one agree that ++
as a general sequence concat operator is clear and explicit, and agree that the Char example brought up by Stefan is a good example of where this simplifies things by disambiguating the user's intent.
I didn't see this before unleashing a rant on the unsuspecting lastest derailed mailing list thread, where I suggested julia-stringconcat
in lieu of the (better) julia-infix-operator-debates
. +INTMAX. Kill the infix operators.
I really think you should avoid using anything that already has a meaning (other than string concatenation) in other major languages in the world. I spent too much time seeing bugs because of developers going back and forth between multiple languages that overused simple operators, which meant different things in different languages or in different contexts.
1) You need something that does not have another meaning for vectors, because people who do string processing for a living expect to be able to use strings as vectors of characters (and vice-versa). That would rule out +, *, ^, &, and ||.
2) You need something that is not confusable to most programmers (not just the numerical computing world). That rules out [](empty array), <> (SQL and other languages). I think ++ would be a little confusable, but as it is a unary operator in C/C++/Java/etc., and this would be a binary operator, I think that it would be fine.
3) You need a simple infix operator, at least for concatenate, otherwise you'll get pasted with tons of virtual tomatoes by all of us who are doing string processing.
I'd vote for ++, it is used for concatenation in a reasonably popular language, i.e. Haskell, it does evoke the idea of adding to strings together, i.e. concatenating them, and it does not have any other meaning for vectors/arrays, and could be used as a general vector/array concatenation operator, which is also good (per point 1 above)
I don't think ++
as a general sequence concatenation operator is particularly clear. Does "abc"++[1, 2, 3]
return:
"abc[1,2,3]"
"abc\x01\x02\x03"
['a', 'b', 'c', 1, 2, 3]
["abc", [1, 2, 3]]
If we're going to have a string concatenation operator, I'd rather it just be a string concatenation operator and nothing else. (Has anyone complained about the lack of an infix operator for other sequence concatenation operations?)
I'm also fine with not having a string concatenation operator, but the presence of such an operator in most other languages makes me wonder if I'd miss it if I were doing more string-heavy projects like web stuff. I'm fine with not having an infix operator if we decide we don't need it because interpolation tends to be more useful than concatenation, but if it's because numerical workflows don't do too much concatenation, I'd think twice.
Whether there should be a replacement is a decision that can be deferred. For once, can we keep a string concatenation-related issue narrowly defined?
If we're going to introduce a replacement, I think it makes the most sense to deprecate *
and introduce the replacement at the same time, so that people can actually use the replacement when they update their code.
@StefanKarpinski you'd also get the nice behavior of "mystring" ++ '\u2000'
, which very annoyingly doesn't work now with "mystring" * '\u2000'
.
@simonstr, it makes sense to me, as somebody who spents most of their time with string processing...
a = Vector{UInt8}[1,2,3]
"abc" ++ a
[97, 98, 99, 1, 2, 3]
(if you combine a Vector with a string, (which is immutable), you'd much rather get back another mutable vector, you can always convert it to an immutable string with UTF8String later)
Then this issue will devolve into every other discussion about this ever. The community has already established that it is unable to handle the topic. It's the ultimate bikeshed and there are a lot of colors to choose from.
If I sound irritated by this, it's because I am. Here's my experience. "Hey, you can't glue strings together with +
?" "Yeah, that's because we use *
." "Oh, okay then." At which point I moved on with my life.
So no, I don't think we should discuss alternative infix operators in this issue, because we'll never make progress if we do.
To voice my opinion on the matter, I have used languages whose string concatenation operator was .
,+
,`(space),
and
++. When I started julia and learned that
was the concat operator, my first thought was
cool, that makes sense, because I never really liked
+. The one argument in favor of not using
*I like is the one given by @StefanKarpinski about the ambiguity between
Charas an integer and
Charas a 1 character string. As such, it seems
++as a concat operator is reasonable, though in that case we should give it clear semantics. The three options for generic
++` (what it should do if the type is equal seems clear) that seem reasonable to me are:
++(x,y) = ++(string(x),string(y))
++(x,y) = #MethodError
++(x,y) = ++(promote(x,y)...)
Where promote promotes an appropriate container type. The last option would imply
x = Uint8[1,2,3]
"abc"++x == Uint8['a','b','c',1,2,3]
@keno, I that's not correct, because 'a' is Char, a 32-bit type. So, the answer would need to be either: UInt8[97, 98, 99, 1, 2, 3], or Char['a','b','c','\x01','\x02','\x03']
I vote for ++
Actually, if you have a ASCIIString, it could promote to just UInt8[], but a UTF8String (as well as UTF16String and UTF32String) would need to promote to Char[].
(and that sort of promotion would be very useful for my string processing...)
This issue could be titled "Taking string concatenation seriously".
the ambiguity between Char as an integer and Char as a 1 character string.
I'll just note that:
julia-0.4> Char <: Integer
false
julia-0.4> 'a' * 'b'
ERROR: MethodError: `*` has no method matching *(::Char, ::Char)
Closest candidates are:
*(::Any, ::Any, ::Any)
*(::Any, ::Any, ::Any, ::Any...)
so no, Char
is not an integer, and hasn't been since a while in the 0.4 series, and therefore there's no ambiguity whatsoever. String
* Char
could perfectly well return the concatenated string, etc. That argument is just obsolete.
Please let's not subject ourselves to 200+ comments before we feel like it's been taken seriously enough.
Can someone just make a PR? I think everyone is in favor of deprecating *, ^ (if only to remove the mailing list bug). The ++ operator seems to be getting decent traction, but it's obviously tricky and not obvious to make it general. There are tricky semantics (similar to push!
vs. append!
), poor algorithmic complexity, and there's not a clear need for other iterables. So let's just make it work well for strings (and maybe chars) and call it a day.
@ScottPJones Sure, I was writing it that way for illustrative purposes, since Char
s can convert to Uint8
s if they are in range. Agreed on the UTF8String promotion problem.
@jiahao: This issue could be titled "Taking string concatenation seriously".
LOL.
Anyone in for a batch order ?
I think I'd want one, but can I get it with ++
instead of *
?
Okay, sorry. Continuing the injokes is fun, but let's stay focused. Let's try to come up with a bare minimum set of features that a PR could reasonably implement:
*
and ^
for strings++
for strings on stringsAnything that generalizes to other containers I think we can hash out inside the PR.
I want one with ++! :grinning:
@staticfloat :100: :+1:
If we want to have a real "taking strings seriously" discussion, for example, like performance issues related to trying to make strings be \0 terminated, where can we do that? (think about the very common substring or slice operation on a string... with Julia you have to create a new string every time)
if we're incurring string breakage anyways, it seems like as good a time as any to eliminate $
too.
my next-best-favorite alternative to not causing breakage is probably the operator-free version (https://github.com/JuliaLang/julia/tree/jb/strjuxtapose)
+1 to deprecating * and ^ for strings.
I sense a lot of obscurity around the ++ operator. Right now it's nice, for example, that "$a$b"
and string(a,b)
do exactly the same thing. It would be easy to confuse this with a++b
. How often do you need to concatenate a string with an array? That's a strange operation, since it's not clear what the array elements refer to --- could be code points, or raw data.
I'm reluctant to even engage in this discussion, but I feel compelled to mention one possibility that has come up in the past (there was even a PR implementing it at one point): using juxtaposition for string concatenation. You would write the following:
"foo" "bar" # "foobar"
"foo" bar # "foo$bar"
foo "bar" # "$(foo)bar"
foo "" bar # "$foo$bar"
Before, this had the drawback that there was no operator form of it, e.g. that you could pass to reduce
, but that's not true anymore since you can use call overloading to make ""(args...)
do string concatenation. Thus, you could write reduce("", objs)
and get a concatenation of the stringifications of a collection of objects. This could be generalized by this:
julia> call{S<:String}(str::S, args...) = join(args, str)
call (generic function with 934 methods)
julia> reduce("", [1,"foo",1.23])
"1foo1.23"
julia> reduce(",", [1,"foo",1.23])
"1,foo,1.23"
If you're about to comment on what @StefanKarpinski just wrote, please read JuliaLang/julia#2301 first.
@stefankarpinski Ugh!!! Had no end of errors in code from Multivalue/Pick applications, because they used juxtaposition... hard to tell just what the code was really doing.
Also, what happens with macro arguments... whitespace is significant in Julia, so
@foo "Scott" "Paul" "Jones"
to a macro expecting 3 arguments just starts breaking, right?
@JeffBezanson If I have to use an Vector{UInt8} or Vector{Char} for mutable strings, to do my string processing, then I really would like to be able to concatenate an immutable string to one of them... just like people complain about not being able to concatenate strings and Chars now, those are both operations that are frequently done.
But what does concatenating a string with a Vector{UInt8} do? What if the vector contains UTF-8?
@JeffBezanson Concatenating with a Vector(UInt8) and a UTF8String should probably be an error. Concatenation with an ASCIIString would be fine (returning a Vector{UInt8}). Concatentation of a Vector{Char} with a UTF8String should return a Vector{Char} (i.e. do the UTF8->UTF32 conversion first)... for performance, I'd check the UTF8String for how many logical characters first, create the output buffer big enough for both, then copy the Vector{Char} in, and convert the UTF8String right into the buffer...)
Actually, it probably would be better to punt on any concatenations with Vectors, except maybe Vector{Char}, and have a mutable string package, and add methods for ++ there... A lot cleaner, IMO.
Yes, I agree, it gets a bit complicated otherwise.
I think it would be a terrible decision not to have any infix operators at all for string concatenation. It should be a clue that nearly every modern general-purpose language has opted to define some infix operator for this operation. And the fact that other languages make many different choices for the operator indicates that there is no ironclad convention that we stray from to our peril.
I agree with @pao that the bikeshed over this is counterproductive, and I find it hard to understand why people care so much about the spelling of this. *
is easy to get used to, is not that weird, and Char*Char
does not come up often enough to be worth worrying about.
The sequence a * b
is an alias for string(a, b)
except in the special case where a
and b
are numerical, oh yeah, or numerical arrays, then it means multiply.
It would be better to give string catenation its own operator so that de-sugaring is always true. And if its not used in any other language then it is fair to all by making everybody equally unhappy :)
That would also make it easier to make a op b op c op d
to mean string(a,b,c,d)
with the obvious performance implications. So only string()
then needs performance optimisations (since at the moment its a very general function).
++
is good. What it does for non-strings can be worked out later.
@stevengj 1) Why do you assume that Char ++ Char does not come up often enough to worry about? This is something that bugs me about the discussions here... I see a lot of “this just isn’t important”... but that is just an opinion, and you have people with experience in string processing telling you that it is important. 2) * is rather confusable for lots of people, as I’d say that for most people doing string processing, they’d first think of repetition, never concatenation. I’ve seen many people have brought that up. 3) Maybe the amount of negative comments about * as concatenation operator, going back years from what I’ve seen, should have been a clue that it wasn’t the best decision, and it should have been reconsidered back in version 0.1 or 0.2, not when people want to get 0.4 released...
@simonster Regarding "abc"++[1, 2, 3]
. This is a nice example that the "operator with dot" symbolic inherited from matlab bites us from time to time. To compare it, the concatenation operator in J/APL is ,
and it comes with a "family of dot operators" distinguished by the slices the operator should work on.
'abc' , '123'
abc123
'abc' ,"0 '123'
a1
b2
c3
or even
'abc' ,"1 0 '123'
abc1
abc2
abc3
This doesn't adress the question of type promotion you addressed.
Edit: Argh, I wasted the chance to say nothing
@ScottPJones, plenty of other languages seem to have string-concatenation infix operators but not char-concatenation operators. I don't see a clamor of complaints. You can still concatenate chars by doing string(char1, char2)
(or use length-1 strings as in Python), so there is no missing functionality. If you look at existing code in any widespread language, the number of uses of string concatenation vastly outnumber the number of instances of concatenation of two chars.
Claims that char concatenation is anywhere near as important or useful as string concatenation are simply not plausible.
There will always be negative comments about spelling choices. (People coming from Python will always complain that we need end
rather than using indentation.) Tastes differ, and a few people with strong feelings can make a lot of noise. If we choose ++
, I guarantee you that newcomers will still complain — "Why didn't you use +
? +
is so much more discoverable and intuitive because I am used to it from language X."
It's not so much that I particularly like *
; I simply don't care that much. My feeling is that continual code churn over pointless spelling changes is more detrimental to Julia that any benefit we will get from substituting one character for another.
Aside from all of that, ++
will be extremely painful from an upgrade standpoint. Because ++
does not currently parse as an infix operator, there will be no clean way to maintain backward compatibility with Compat
— it will be a flag day upgrade, requiring every package using string concatenation to fork into 0.3 and 0.4 versions (or use string(a,b)
, giving up on infix concatenation entirely).
The fact is that continual code churn over pointless spelling changes is more detrimental to Julia that any benefit we will get from substituting one character for another.
Yes, it should only ever be changed once, from what it is now to the final state (or no change if thats the decision). Deprecating now and adding an operator later when everyone has changed their code to string(a,b) or "$a$b" is just being mean to the users.
and O(n) for a * b * c * d ... concatenation isn't good.
Can you do better than O(n) for string concatenation?
The frequency and vehemency of discussions around this subject beg for a change.
*
and^
were introduced for strings back when the language wasn't as strict on operator punning and overall meaning.As a first, step, I propose we deprecate these two methods for string operations.
As a next discussion, we can talk about the possibility of using a different operator(s) for concatenation/repetition. Just using
repeat
, with no operator, has been suggested, as well as the following for string concatenation:++
, as a general sequence concatenation operator..
, similar to LuaThings to consider:
vcat
/hcat
?