JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
44.98k stars 5.42k forks source link

Suggestion - literal syntax for `Dict` via `[; ...]` #39909

Open andyferris opened 3 years ago

andyferris commented 3 years ago

Many programming languages provide convenient literal syntax for their dictionaries/maps, which is very helpful, especially for laying out data literally as textual code a la JSON. I have a proposal for Julia using something along the lines of [; key = value, ...] or [; key => value, ...]. I understand the very, very old syntax [key => value, ...] was deprecated because it was ambiguous with an array of pairs, and the semicolon can alleviate that issue. I haven't seen dictionary notation discussed since around when that old syntax was deprecated, which is a pity because I think literal syntax could be worthwhile.

To me, looking at the following table suggests this syntax:

Element type Keys Type Literal Syntax Empty Literal Syntax
Heterogenous, immutable Automatic (1, 2, 3, ...) Tuple (1, 2, 3) ()
Heterogenous, immutable Specified (Symbol only) NamedTuple (a = 1, b = 2, c = 3) or (; a = 1, b = 2, c = 3) (;)
Homogenous, mutable Automatic (1, 2, 3, ...) Array [1, 2, 3] []
Homogenous, mutable Specified (Any value) Dict Perhaps ["a" = 1, "b" = 2, "c" = 3] or [; "a" = 1, "b" = 2, "c" = 3]? [;] ?

It may also be natural to use the Pair operator => instead of the assignment operator =.

Similar to how a Vector{T} can be created with T[], perhaps an empty dictionary could be created with Pair{K, V}[;] or just {K, V}[;]? There may be other syntax possibilities like K[;]V.

At first blush it appears this syntax is currently available:

   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.7.0-DEV.653 (2021-03-03)
 _/ |\__'_|_|_|\__'_|  |  Commit 502c03974b (0 days old master)
|__/                   |

julia> [;]
ERROR: syntax: unexpected ";"
Stacktrace:
 [1] top-level scope
   @ none:1

julia> [; 1]
ERROR: syntax: unexpected ";"
Stacktrace:
 [1] top-level scope

julia> [a = 1]
ERROR: syntax: misplaced assignment statement in "[a = 1]" around REPL[2]:1
Stacktrace:
 [1] top-level scope
KristofferC commented 3 years ago

Tangentially, how long has the {} syntax been discontinued? If we still haven't found a use for it, couldn't we finally decide that it can be used for Dicts? It has a precedent from other languages and is concise.

rfourquet commented 3 years ago

At first blush it appears this syntax is currently available:

It's interesting that [;] and [;1] don't parse, but [1, 2; 1] does:

julia> :([1, 2; 3])
:([$(Expr(:parameters, 3)), 1, 2])

A possible concern with using [; dict_entries...] is that ; in []-notation already has a meaning for making matrices., but if the semicolumn is leading, that might not be that confusing.

For people liking to play with this syntax, you can use https://github.com/rfourquet/SafeREPL.jl:

julia> function tersedict(ex)
    if all(x -> Meta.isexpr(x, :(=), 2), ex.args) # [1=2, 3=4] : Dict
        Expr(:call, :Dict, [Expr(:call, :(=>), x.args[1], x.args[2]) for x in ex.args]...)
    else
        ex
    end
end;

julia> using SafeREPL # should come after `tersedict` definition, as by default `2` is otherwise interpreted as `BigInt`, which `Meta.isexpr` doesn't like

julia> swapliterals!(:vect => tersedict)

julia> [1=2, 3=4]
Dict{Int64, Int64} with 2 entries:
  3 => 4
  1 => 2

julia> [1, 2, 3]
3-element Vector{Int64}:
 1
 2
 3

Tangentially, how long has the {} syntax been discontinued? If we still haven't found a use for it, couldn't we finally decide that it can be used for Dicts?

Sound like a good idea, with the caveat that it provides already a nice syntax for macros to interpret. BTW, again with SafeREPL, I have {1, 2} defined to be a Set, and {1=2, 3=4} or {1=>2, 3=>4} defined to construct a Dict via:

function makesetdict(ex)
    if all(x -> Meta.isexpr(x, :call, 3) && x.args[1] == :(=>), ex.args) # {1=>2, 3=>4} : Dict
        Expr(:call, :Dict, [Expr(:call, :(=>), x.args[2], x.args[3]) for x in ex.args]...)
    elseif all(x -> Meta.isexpr(x, :(=), 2), ex.args) # {1=2, 3=4} : Dict
        Expr(:call, :Dict, [Expr(:call, :(=>), x.args[1], x.args[2]) for x in ex.args]...)
    else # {1, 2} : Set
        Expr(:call, :Set, Expr(:vect, ex.args...))
    end
end

swapliterals!(:braces => makesetdict)
andyferris commented 3 years ago

I agree - if we make literal Dict I think it is worth creating literal Set at the same time.

To me that could be an argument in favor of = over => because you'd want to be able to create a Set{Pair{A, B}} as well as a Dict{A, B}.

That works with either syntax [; 1, 2, 3] or {1, 2, 3} for Set (and [; "a" = 1, "b" = 2, "c" = 3] or {"a" = 1, "b" = 2, "c" = 3} for Dict). Otherwise we can use the semicolon to differentiate between the keyed dictionary and plain set, via {1, 2, 3} (set) and {; "a" = 1, "b" = 2, "c" = 3} or {; "a" => 1, "b" => 2, "c" => 3} for Dict.

(With dictionaries iterating pairs, the latter allows for {; other_dict...} naturally. Though to be honest I do prefer the named-tuple value iteration style, which works fine for (; nt...) so I don't see that as much of an issue.)

andyferris commented 3 years ago

@rfourquet SafeREPL.jl sounds really cool!


Thinking about splatting/slurping and allowing for Set, I can't think of way to use [ ] to slurp up Vector, Dict and Set.

Using { } would be consistent with many other languages. For example

Syntax Value
{} Set()
{;} Dict()
{1, 2, 3} Set([1, 2, 3])
{"a" = 1, "b" = 2, "c" = 3} Dict(["a" => 1, "b" => 2, "c" => 3])
{; "a" = 1, "b" = 2, "c" = 3} Dict(["a" => 1, "b" => 2, "c" => 3])
{itr...} Set(itr)
{; itr...} Dict(pairs(itr))
{1, 2, 3, itr1..., itr2...} Set([1, 2, 3, itr1..., itr2....])
{"a" = 1, "b" = 2, "c" = 3, itr1..., itr2...} Dict(["a" => 1, "b" => 2, "c" => 3, pairs(itr1)..., pairs(itr2)....])
{; "a" = 1, "b" = 2, "c" = 3, itr1..., itr2...} Dict(["a" => 1, "b" => 2, "c" => 3, pairs(itr1)..., pairs(itr2)....])
{f(x) for x in itr} Set(f(x) for x in itr)
{; f(x) for x in itr} Is this Dict(f(x) for x in itr)?

One major difficulty though is specifying the eltype, keytype or valtype of the containers. How do I create a Set{String}() or a Dict{String, Int}()?

andyferris commented 3 years ago

How do I create a Set{String}() or a Dict{String, Int}()?

I have thought of this so far.

Syntax Value
{}T Set{T}()
V{;}K Dict{K, V}()
{;}K Dict{K, Any}()
V{;} Dict{Any, V}()

I'm not 100% sure on the order of K vs V. I put it this way to keep the "value type" on the left just like for Array, e.g. T[]. The keytype is on the right. The keys of an AbstractDict is an AbstractSet so, if you think about it, there is some level of consistency with the eltype of the Set being on the right... (Similarly, Array doesn't let you specify the key type, so there is no possiblity of putting a type on the right of []).

mbauman commented 3 years ago

I really like the symmetry with named tuples — we just need to follow that to its logical conclusion:

julia> a = :sym; b = :bol;

julia> (a = 1, b = 2)
(a = 1, b = 2)

julia> (; a => 1, b => 2)
(sym = 1, bol = 2)

julia> [a = 1, b = 2] # this syntax would necessarily be limited to symbol keys
Dict{Symbol,Int64} with 2 entries:
  :a => 1
  :b => 2

julia> [; a => 1, b => 2] # this syntax would of course allow arbitrary key types; the key is evaluated
Dict{Symbol,Int64} with 2 entries:
  :bol => 2
  :sym => 1

I don't think we need a Set syntax — especially if it doesn't just obviously "fall out".

mbauman commented 3 years ago

That said, I don't very much like how this interacts with semicolons for vcat... with #33697, I could imagine, e.g., [;;] creating a 0x1 matrix, [;;;] a 0x0x1, etc. In fact, I'm not sure, but the [;] syntax might no longer be available after that gets merged.

bramtayl commented 3 years ago

I think there is room for more parallelism here for sure, I'm not sure I like ["a" = 1 though...the keyword expression seems to mean something like "encode the keyword symbol as part of the type" (which is different from a pair)...

What would be super cool to me is if we revised the NamedTuple internals to be something like this:

(; a = 1, b = 2) goes to (name"a" => 1, name"b" => 2), where the name string macro would pull the symbol into the type domain. I can see the following advantages

I can see the following disadvantages:

I sometimes wonder too how useful having a pair type really is...it's really not that much different from a two length tuple. So maybe

"a" => 1 going to ("a", 1) could be good idea?

KristofferC commented 3 years ago

Please don't derail the issue. This is not about "revisiting NamedTuple internals" or removing the Pair type.

BioTurboNick commented 3 years ago

Relevant to the discussion on why [1, 2; x = 4] parses: https://github.com/JuliaLang/julia/pull/33697#issuecomment-767004965

andyferris commented 3 years ago

I don't very much like how this interacts with semicolons

Yes, having multiple meanings for semicolons in [] could be confusing even if it were technically possible to have it together with #33697.

@mbauman I'm not sure if it was clear, but I was suggesting either {a = 1} or [; a = 1] to be equivalent to Dict(a => 1) not Dict(:a => 1) — so rather than thinking of the left side of the = as a Symbol it is a variable referring to some other value, or else a literal value.

Out of interest, I noticed that Rust's RON does a similar thing, making flexible usage of assignment in maps (like JSON, they use :, but that's not available in Julia). It is actually very similar to the proposed Julia syntax, with (a: 1, b: 2) creating an anonymous struct (like our (a = 1, b = 2)) and {"a": 1, "b": 2} creating a generic map from strings to integers (above I was suggesting {"a" = 1, "b" = 2}).

In fact I came upon this idea thinking of the strengths/weakness of JSON and was reading about RON wondering what a Julia-inspired "JLON"/"JON" would possibly look like. With JSON it's very powerful to write out your nested data in a textual form that expresses the data structure generically without worrying about their "nominal" types. It seems to me that tuple, named-tuple, arrays, dictionaries and sets make up a pretty reasonable set of generic data structures one can use to express just about any value (ignoring references - which complicate data formats enormously). I have been frustrated in the last year not being able to have a JSON set, in fact (as it's really nice when the data model encode the assumptions for you, making invalid states impossible).

There are a few other languages with set literals - such as Clojure and Dart. In these languages the syntaxes of dictionaries and sets are similar and share usage of braces, partly because it is noted that sets are just dictionaries without values, and partly that curly braces denote sets in mathematics, which I imagine should appeal to Julia's technical audience.

mbauman commented 3 years ago

I'm not sure if it was clear, but I was suggesting either {a = 1} or [; a = 1] to be equivalent to Dict(a => 1) not Dict(:a => 1) — so rather than thinking of the left side of the = as a Symbol it is a variable referring to some other value, or else a literal value.

That was clear, but I find it very confusing as it'd make (; a=1,) behave differently from [; a=1,]. I was intentionally proposing different semantics to avoid that confusion. A slightly different meaning of evaluation semantics in = is far more confusing to me than a drastically different use of ;. I'm not thrilled about = in curlies, either, for the same reason.

Do any other XONs use their assignment operator for the notation? JSON and RON use :. We use => to define key-value pairs, that's what we should use for key-value pairs.

andyferris commented 3 years ago

I am aware of Lua which uses = in tables (like a JS object) in this way.

andyferris commented 3 years ago

The commonality I noticed amongst them all is that in all cases the “key value separator” was always a fixed language “syntax” not an operator corresponding to a (possibly overloadable, possibly first-class) function.

Julia has very few of these syntaxes besides =. I can’t think of any other characters (, or ; or ?) that are suitable.

I think this is important because, well frankly anyone can do anything with=> because it is a generic function (and so long as it’s not type piracy we should be ok with that!), and that could make the interpretation of the dict literal ambiguous IMO. In the other places we need key value separators (keyword arguments and named tuples) we use = which has a fixed language meaning.

mbauman commented 3 years ago

In named tuples, => is syntax and doesn't call whatever => you have in your namespace or even Base's version:

julia> a=>b = "a=>b"
=> (generic function with 1 method)

julia> (; :a=>1, :b=>2)
(a = 1, b = 2)

julia> Base.Pair(a::Symbol, b::UInt8) = Base.Pair(a, Int(b)+1)

julia> (; :a=>0x1, :b=>0x2)
(a = 0x01, b = 0x02)

julia> Base.Pair(:a, 0x1)
:a => 2