JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.57k stars 5.47k forks source link

:" " syntax for non-standard quoted symbols? #9945

Open alyst opened 9 years ago

alyst commented 9 years ago

At the moment to generate non-standard symbols in Julia (e.g. with whitespace), one has to write symbol("a b"). R uses backtick quotes for similar purposes, and that is much more concise. The typical use case is to refer to a dataframe column with non-standard characters in its name (e.g. imported from CSV/XLS file). It would be nice to use backticks to quote symbols in Julia as well, but backticks are currently reserved for command interpolation. What if, instead of backticks, commands would be specified as sh"<cmd>" or cmd"<cmd>", and backticks would be granted to symbols (given it's a more generic and frequent use case)? The use of backticks for commands was inspired by Unix shell, but, in fact, it's a little bit misleading as backquoting a command in Julia doesn't execute it, as in shell. sh"<cmd>" alternative could be naturally extended to the other external scripts, e.g perl"...", lua"..." etc.

johnmyleswhite commented 9 years ago

The reason we made the choice to use symbols instead of strings in DataFrames was to discourage non-standard column names. It would be much better to reverse that decision than to redefine backticks in Base Julia.

alyst commented 9 years ago

I also try to avoid non-standard column names whenever possible, but sometimes it's just much more straightforward to use the original data format. Do you propose to use something like df."top 5%" in the revised syntax? In dplyr one could also write mutate( diff =top 5%-bottom 5%), would it be also possible in DataFrames.jl without additional syntactic burden?

JeffBezanson commented 9 years ago

There is a lot to be said for using cmd"..." for commands, but in that case I think there are probably several potential uses for backticks that would have higher priority. We could also add sym"a b" for symbols.

nalimilan commented 9 years ago

@alyst I think DataFrames should support column labels for more expressive descriptions which could be used automatically e.g. for plotting. This has been discussed somewhere in DataFrames.jl. Column names would better remain very simple, like top5. Every time I've used names like "top 5%" in R I've come to regret it the second time I had to type the name with backticks.

alyst commented 9 years ago

@nalimilan I am also the strong proponent of "column names must be valid identifiers" policy, but in certain cases (data is imported and the input format is fixed; column names encode metadata; porting existing R scripts to Julia etc) it would just be convenient to use non-standard ones instead of fixing the data and the scripts to comply with the policy. Abstracting from the data frames, the generic question here is -- should Julia support [an easy way of expressing] non-standard identifiers/symbols? Someone might overuse the feature, but ATM Julia gives so much coding freedom that it would be just one minor thing.

johnmyleswhite commented 9 years ago

My personal sense is that Julia does support an easy way already: symbol("a b") is exactly how hard it should be to create a non-standard identifier. I think of that syntax as a tax on creating the externalities generated by non-standard identifiers.

My sense about this issue is that we should have part of the discussion in DataFrames.

tonyhffong commented 9 years ago

I'm very much in agreement with what @johnmyleswhite said. It's a minor cognitive load commensurate to the reminder that one cannot use that symbol as if it's a field or a variable.

I like sym"a b" (to @JeffBezanson's point), as it makes show( s::Symbol ) more compact/pleasing for non-standard identifiers.

So, +1 for sym"a b" -1 for "backtick a b backtick"

prcastro commented 9 years ago

If we changed the commands for something like cmd"", we would free the backtick for something like infix operators (like Haskell does). But don't know what are the implications of that.

StefanKarpinski commented 9 years ago

I would propose making :"a b" the syntax for making a symbol from a strong literal instead. This is a pretty safe syntax to use since currently it means the same thing as without the colon, so why would you write it?

Using cmd"..." for commands is not very appealing because then you need to screw around with escaping " inside of shell commands. Currently, you can just cut and paste any commands from the shell and single and double quotes work the exact same way. Using a triple-quote form could help with that too, but it's kind of ugly.

If we were going to use backticks for anything else, I would consider expression quoting since that is how we quote expressions is markdown and emails anyway.

alyst commented 9 years ago

@StefanKarpinski +1 for :"a b"! I just wonder if, for the sake of generality, backquotes, instead of being bound to Cmd class, could become an alias to triple double quotes and any custom interpolation logic would be handled by Cmdstartme` literals or insiderun(cmd::AbstractString)`.

tonyhffong commented 9 years ago

Hang on. what does :( "a b" ) parse to?

alyst commented 9 years ago

@tonyhffong Recent 0.4-devel gives ASCIIString for typeof(:( "a b" )).

JeffBezanson commented 9 years ago

I think we'd have to keep parsing :(" ") as a string, and give a quoted symbol only for :" ".

tonyhffong commented 9 years ago

Conceptually, we have now

:( a ) == :a  # A quote of a variable equals to that variable's symbol.
:"a b" == :( "a b" ) == "a b" # in both 0.3 and 0.4-dev. A quote of a string literal is just that string literal.

which kind of makes sense to me. Maybe it's a hobgoblin of little minds, but I do think it's nice that way.

StefanKarpinski commented 9 years ago

It's consistent, but the former is useful while the latter is useless.

toivoh commented 9 years ago

Late to the party here, but I really wouldn't feel comfortable with punning the quoting : operator like this. I feel that quoting is enough to wrap your head around already, and I don't see how avoiding the minor inconvenience of typing two more characters with sym"a b" justifies this.

ScottPJones commented 9 years ago

+1 also for sym"a b" +1 for cmd"a b" (I'd already thought of that independently... freeing up backtic would be somewhat breaking, but very nice to have free for better uses)... wouldn't cmd"""I have "quotes" inside me""" work also?

c42f commented 5 years ago

While implementing #32408 I noticed the notation :"foo$x" already means something useful and isn't redundant with "foo$x" (in contrast to :"foo" vs "foo").

If we made :"foo" notation for Symbol("foo"), we'd presumably also want :"foo$x" to be Symbol("foo$x"). It's a useful pun on the quote operator, but a pun nonetheless.