Closed donatj closed 5 years ago
@MOZGIII i don't like it personally, because allowing any even numbers of backticks introduces ambiguities:
var str1 = ``+`` // Go 1 says this is "", but under Go 2 this might be "+"
var str2 = ````+```` // Increase to 4 backticks, but still could be "" or "+"
So Go would have two options:
str1
would be "+"``+``
is "", which would make even numbers of backticks unusable because they would just be parsed as empty stringsExample of what I mean by the second option:
var str3 = ````
this is my raw string
There would be a syntax error on line 2. This is because `str3` would be set equal to an empty string, and then the compiler would see `this is my raw string`, which is not a valid line of Go code, and fail compilation.
@deanveloper you used two backticks in the example, but I proposed more than two. Two backticks clearly does not fit for the reasons you gave. However, with 3, 4 and any greater number of backticks there are no ambiguities - in Go 1 all of them are syntax errors.
The edge case here would probably be the ability to encode an empty string with this notation. I'd just prohibit that altogether - it'd probably be easier for the lexer that way, and it doesn't seems like a big issue to me. At least I could live with that.
What I don't like is the collision with the Markdown notation for code decoration. It may complicate using Go code in Markdown code sections. Maybe I can live with that too - but for me personally it kind of matters more than the ability to represent an empty string...
What do you think? I'm ok even with the must-be-odd number of backticks proposal - it's better than nothing, and it has it's own advantages.
Sorry for all of the edits to my comment which probably looks confusing post-edits.
I'm personally still a fan of my initial idea with requiring an identifier immediately before/after the respective opening/closing backtick. I don't know how parsing/lexing/compiling/etc works however, so I'm not sure of the severity about how much it would complicate the compiler. But it definitely doesn't collide with Markdown code fences :wink:
I'm not upset with the odd number of backticks rule however, it just seems "odd" to only allow odd numbers.
Actually, there is another pretty serious downside with the backticks - and that's collision with the backticks inside the string itself:
"`test`" != ````test````
This is a serious problem for me, cause it suffers from the kind of similar issue to the one that regular ` have - and that is support for representing backticks inside the string itself.
This brings me back to how well though this is in ruby. It has enough string literal forms to cover every case I can think of.
@MOZGIII Yeah, that was brought up by @donatj a few posts ago.
Also speaking of which, @ianlancetaylor, backticks and quotes are not synonymous in SQL. Backticks are used for quoting identifiers in order to make sure that you can select tables/columns named after keywords (or contain strange characters such as spaces and commas), while quotes are used for string literals.
For instance:
-- valid (MariaDB)
SELECT * FROM `database`.`table`
-- invalid (MariaDB)
SELECT * FROM "database"."table"
It would be great to have "is identical to" (or another Unicode marker) as a way to ultra-backtick ASCII text.
sql := ≡SELECT foo
FROM bar
WHERE baz
= "qux"≡
...feels categorically simpler than a gang of backticks or any other in-band signaling. That’s the core issue, that using an ASCII delimiter for arbitrary ASCII text always has exceptions by definition, and a surprisingly high incidence of them in cases like this SQL example (and Markup and ...) where you’re quoting something that is likely to already be quoting things. Recursive quoting tempts fate because “smart people just like us” had the same ideas for quoting their thing, so when Go wants to quote it, the probability of collisions is very high.
OTOH, if Go used “mango” at each end, and SQL used “rose”, then the space would be huge and collisions rare. It would look dumb, of course, but would not have the collisions of everyone using the same four quotation delimiters.
This is why I propose adding U+2261 IDENTICAL TO as an alternative to back tick in marking raw text.
I don't like rarely-used single rune unicode sequences mainly for the following reasons:
A further possibility which I don't think has been mentioned so far is to introduce a new single-character escape \` which would only be valid within raw string literals.
This would be analogous to the existing escapes \' (only valid in rune literals) and \" (only valid in 'ordinary' string literals).
The new escape wouldn't be ideal from a 'cut and paste' perspective as you'd need to go through and prepend each back-tick with a slash. However, this would be easier than having to split each back-tick out into an ordinary string literal and (at least to my eye) would stand out more than simply doubling each back-tick as well as being a rarer combination of symbols.
Compared to solutions which involve using an odd number of back-ticks as a delimiter, it also has the advantage that leading and trailing back-ticks are easier to read.
@alanfo Then you would have to escape \
as well, which is not backwards compatible.
Perhaps it would be best to continue the discussion elsewhere, as it's drifting further apart from the original heredoc proposal. It's always simple to file another, separate proposal once another idea has fully formed.
@mibk I don't follow why you would have to escape \
as well. Unless it was followed by a back-tick, a slash would be treated literally as it is now.
Also it would be backwards compatible as, at present, a raw string literal can't include a back-tick at all.
@alanfo Consider this example:
`\`
Is it an unterminated raw string with an escaped backslash, or a raw string containing a single backslash?
It's a raw string containing a single backslash as it is now.
I'll admit it's an awkward case for the parser to deal with but `\`` would be fine (a raw string containing a single back-tick) whereas ```` might be problematic.
@mvdan Given that @ianlancetaylor said we're not going to adopt "here documents" but didn't close the issue and indeed came up with a suggestion on how else to deal with the same problem, I don't see why we shouldn't continue the discussion here. Otherwise the same points will have to be made all over again and there doesn't seem to be a consensus on an alternative proposal in any case.
@mvdan what @alanfo said, and also note that @ianlancetaylor retitled the issue to be more generic.
Of the ideas listed here, @deanveloper's original idea (now unfortunately hidden in the fold) seems by far the best to me.
All of @MichaelTJones's Unicode suggestions don't really work for quoting Go code itself, and are awkward for many of us to type.
The "more backticks" ideas discussed by @ianlancetaylor and others has the problem that it does not work for text that begins or ends with a backtick.
@deanveloper's idea doesn't have these issues. Really, the only one I see is the one pointed out by @jimmyfrasche: it adds a certain complexity to lexing that's different from anything in the language today. But I think that might be fundamental to any syntax which allows quoting arbitrary text.
I personally think that these syntaxes are quite a bit different from the original proposal which was asking for a feature that other languages implement, while the current discussion is simply about improving raw strings rather than implementing HEREDOC. I'll start a new proposal, which will include a lot of the discussion from this post.
@deanveloper It seems to me that most of these suggestions are things that other languages implement, or similar to them. Most current languages have some form of raw string literal these days.
My only concern with
delimiter`raw string with ` characters`delimiter
is that it doesn't lead with the fact that it is a string. C++ (R"delim( string )delim"
)) and Rust (r#" string "#
) and Swift (#" string "#
) are more clear as to when a string is starting.
My only concern with
is that it doesn't lead with the fact that it is a string.
That's a valid concern, it's a bit hard to see where the string starts and ends with long delimiters. However, with short delimiters, it seems to be much less of a problem:
// keep it with short delimiters
var x = raw`this is a string with ` characters`raw
// or all-capital letters? not previously seen as convention anywhere else in Go?
// this would make it much easier to see that it is representing a raw string.
var y = RAW`this is a string with ` characters`RAW
Perhaps establishing some sort of convention to use brief delimiters, maybe all-capital as well (such as SQL
and RAW
) is a good idea. Maybe golint
should enforce something like this? I'm not 100% it's a good idea to enforce it with golint
, but I do think that having a convention to use short and possibly capital delimiters would help with that aspect significantly.
I brought up the idea of this convention in this comment: https://github.com/golang/go/issues/32190#issuecomment-497315188 although it was for a different reason.
By the way I think I can partially revive my broken earlier suggestion by saying that writing N backquotes (N >= 2) followed by a double quote is a raw string literal that is terminated by a double quote followed by N backquotes.
s := ``"this is a `raw` "string" literal "``
fmt.Print(s)
prints
this is a `raw` "string" literal
It doesn't collapse nicely to the current raw string literals, but it does have the advantage of sticking to existing string quotation characters. Unless I've missed something again.
I actually like that idea. My only real issue with the original N backticks idea was that the "N is odd-only" restriction made it seem very inconsistent. It also fixes the issues with how badly the other syntax played with Markdown. I'll make sure to bring up this one in the proposal that I am working on (along with others that were in this thread).
I think the only real concern is that it would make current raw strings that start or end with quotes (ie `"my string"`
) confusing to look at for future learners of Go who do not know the history of raw strings.
I just bumped on a non-SQL use case regarding this, which I wanted to add as a datapoint.
I have an html template which I am storing as a backtick-quoted string. Now in that template, I have Githubissues.
I would like to propose Go add support for a HEREDOC syntax to make adding literals of particular precarious strings easier.
A common syntax in many programming language is
<<< (boundary)
to open and a line containing just said boundary to close.I would propose something along the lines of:
My personal reasoning is for MySQL queries.
Myself and my company work with MySQL a great deal. Backticks are used to quote tables and fields in MySQL. Our queries will often contain both numerous quotes and backticks - particularly queries generated by tooling.
There is no way to escape a backtick in a backtick string in Go, so we end up either a using double quotes string and escaping all the quotes within or using backticks and breaking out of the string on backtick (ala
`x` + "`" + `y`
)Currently we end up with something like
or in cases with massively more quotes than backticks I'll do something like
These examples are toys obviously, but this become much more of an issue on large 30+ line report queries - and more importantly makes copying queries out of code and into a MySQL client a real pain.