JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.42k stars 5.45k forks source link

Reintroduce concise syntax for Dict construction? #12930

Closed malmaud closed 9 years ago

malmaud commented 9 years ago

As dicussed in https://groups.google.com/forum/#!topic/julia-users/1bwx3fjSO5A, many are unhappy with how verbose Dict literal construction has become in 0.4. I'm aware there were real problems with the old syntax, but maybe we can still think of a way to allow a more parsimonious syntax going forward.

stevengj commented 9 years ago

See #6739 for discussion on the original change.

stevengj commented 9 years ago

What's so verbose about Dict(3=>4, 5=>6)? It is only three four more characters than [3=>4, 5=>6]. (I can't count.)

kmsquire commented 9 years ago

To add a little more context, this is especially true when dealing with Dicts of Dicts (e.g., when printing Julia representation of JSON objects):

julia> using JSON

julia> a="{\"menu\": {
                \"id\": \"file\",
                \"value\": \"File\",
                \"popup\": {
                  \"menuitem\": [
                    {\"value\": \"New\", \"onclick\": \"CreateNewDoc()\"},
                    {\"value\": \"Open\", \"onclick\": \"OpenDoc()\"},
                    {\"value\": \"Close\", \"onclick\": \"CloseDoc()\"}
                  ]
                }
              }}
              "
"{\"menu\": {\n         \"id\": \"file\",\n         \"value\": \"File\",\n         \"popup\": {\n           \"menuitem\": [\n             {\"value\": \"New\", \"onclick\": \"CreateNewDoc()\"},\n             {\"value\": \"Open\", \"onclick\": \"OpenDoc()\"},\n             {\"value\": \"Close\", \"onclick\": \"CloseDoc()\"}\n           ]\n         }\n       }}\n       "

julia> println(JSON.parse(a))
Dict{AbstractString,Any}("menu"=>Dict{AbstractString,Any}("id"=>"file","value"=>"File","popup"=>Dict{AbstractString,Any}("menuitem"=>Any[Dict{AbstractString,Any}("onclick"=>"CreateNewDoc()","value"=>"New"),Dict{AbstractString,Any}("onclick"=>"OpenDoc()","value"=>"Open"),Dict{AbstractString,Any}("onclick"=>"CloseDoc()","value"=>"Close")])))
stevengj commented 9 years ago

@kmsquire, even if you need to specify the type, the old syntax was still nearly as verbose (three fewer characters): (AbstractString=>Any)[ ... ] vs. Dict{AbstractString,Any}( ... ).

malmaud commented 9 years ago

Just so concisely summarize the original change motivation, it seems like the decision to make => first-class (which I totally agree with) is what disallows [a=>b, c=>d] (since it would be ambiguous with a Vector of Pairs). What was the probelm with curly-brace syntax?

stevengj commented 9 years ago

@malmaud, curly braces used to be Any[...] (ala Matlab cell arrays), and this needs to be deprecated for at least one major release, before it can be repurposed.

Also, punctuation is precious. Even when curly braces are available to be repurposed, is it really worth using them to save typing 3-4 characters?

malmaud commented 9 years ago

Ah right. So maybe this is as simple as repurposing curlies in .5.

mbauman commented 9 years ago

I think the leading contender is tuple types: https://github.com/JuliaLang/julia/issues/8470

With https://github.com/JuliaLang/julia/commit/85f45974a581ab9af955bac600b90d9ab00f093b, curly braces could maybe be used for both Tuples (with types) and something else (with values, or more specifically just Pairs). Sure, it's two meanings for the same syntax, but they're used in very different contexts with very different content between the braces.

ScottPJones commented 9 years ago

My current problem is more the inconsistencies between the type inference with [ ] and Dict( ) (see https://groups.google.com/forum/#!topic/julia-users/1bwx3fjSO5A) Some more issues: Dict("a"=>1,"b"=>2) => Dict{ASCIIString,Int64}, but Dict("á"=>1,"b"=>2) => Dict{Any,Int64}. That could have come back as Dict{UTF8String,Int64}, or at least Dict(AbstractString,Int64).

JeffBezanson commented 9 years ago

I'm against going back on this.

IainNZ commented 9 years ago

The comparison in the OP of the julia-users thread is not even remotely fair, because they are specifying the type in one case and not the other.

lobingera commented 9 years ago

@stevengj, yes dicts are a mighty tool (See Python), spending Syntax in that might be a good investment.

mdcfrancis commented 9 years ago

@IainNZ - slightly unfair except for the following

julia> { :a => 1 }

WARNING: deprecated syntax "{a=>b, ...}".
Use "Dict{Any,Any}(a=>b, ...)" instead.
Dict{Any,Any} with 1 entry:
      :a => 1

If you follow the depreciation it suggests that one should use types when replacing {}.

IainNZ commented 9 years ago

Subtle point I guess, because { } means Dict{Any,Any}, but its not clear {Any,Any} was wanted - in fact you the example used {Symbol,Any} - which is more like [] in 0.3

mdcfrancis commented 9 years ago

If we were being consistent we would be removing [1,2,3 ] as well and making people type Vector( 1,2,3 ) etc. I see no reason why Vectors are more special that associative collections.

jakebolewski commented 9 years ago

I don't see why we are debating this now, 0.4 is close to being finally branched. Discussion about this change is almost a year old at this point.

mdcfrancis commented 9 years ago

@jakebolewski because right now we are spending a large amount of time updating packages and code to use 0.4 - this is the first time for many where they are seeing the impact of this change.

IainNZ commented 9 years ago

@mdcfrancis I don't think that necessarily follows re [] and Vector, but if you'd like to submit a PR implementing a special syntax for Dict I'm sure it'll be assessed on its merits for Julia 0.5.

JeffBezanson commented 9 years ago

There isn't enough syntax for every data structure, and I would argue that distinguishing data structures by bracket type is not terribly clear anyway. I also think vectors really are more fundamental than dictionaries. How are dictionaries implemented after all?

I might add that { } is well-established notation for sets, so maybe { } should only construct sets. But I don't want to debate whether sets or dicts are more important.

jrevels commented 9 years ago

Just my 2 cents, but I greatly prefer the new syntax. Dict{K,V}(...) reads clearer to me than (K=>V)[...]. It is also more explicit; it's obvious that you're constructing a Dict rather than an array of Pairs.

If you follow the depreciation it suggests that one should use types when replacing {}.

To back up @IainNZ's point, if you use the old syntax for a type-inferred Dict, the deprecation warning actually shows you the correct new syntax for making the same Dict:

julia> ["a"=>2, "b"=>3]

WARNING: deprecated syntax "[a=>b, ...]".
Use "Dict(a=>b, ...)" instead.
Dict{ASCIIString,Int64} with 2 entries:
  "b" => 3
  "a" => 2

That fact that this was a source of confusion in the first place is an argument in favor of the new syntax, IMO.

I would argue that distinguishing data structures by bracket type is not terribly clear anyway.

I very much agree with this.

I might add that { } is well-established notation for sets, so maybe { } should only construct sets. But I don't want to debate whether sets or dicts are more important.

My vote is strongly in favor of using {} for #8470, instead of using them to construct a new value (I suppose a type is iteslf a value of type DataType, but you know what I mean).

mdcfrancis commented 9 years ago

To clarify on a few points

The feeling around associative collections is simply that they are very prevalent in coding, especially when you are interfacing with other systems. If you look at Escher (for example) you'll see them all over the place.

For my common use cases [ :a => 1, :b => [ :x => 2.3] ] is more that sufficient, e.g. a list of pairs which may be coerced when required into an associative. Perhaps the challenge here is more that this syntax is deprecated where it should really be encouraged?

nolta commented 9 years ago

What's so verbose about Dict(3=>4, 5=>6)? It is only four more characters

My worry is that + -> .+ was only one more character, and look how well that turned out (#7226).

jrevels commented 9 years ago

The feeling around associative collections is simply that they are very prevalent in coding, especially when you are interfacing with other systems.

No dispute there!

For my common use cases [ :a => 1, :b => [ :x => 2.3] ] is more that sufficient, e.g. a list of pairs which may be coerced when required into an associative. Perhaps the challenge here is more that this syntax is deprecated where it should really be encouraged?

The reason why running [:a => 1, :b => [:x => 2.3]] currently throws a deprecation warning, but still actually follows through with old behavior, is to give folks time to adjust before removing the old behavior entirely. I'm not sure when the changeover will actually occur (maybe once v0.4 actually releases), but once it does, this will indeed be the right syntax for constructing an array of Pairs.

If the new behavior is sufficient for your case and you want to make the switch now, you can explicitly "opt in" by using the Pair constructor instead of the => operator:

julia> [Pair(:a, 1), Pair(:b, [Pair(:x, 2.3)])]
2-element Array{Pair{Symbol,B},1}:
 :a=>1
 :b=>[:x=>2.3]

Definitely not as pretty as using the => operator, but it will work until the new behavior fully comes into play.

I'm not convinced that it is obvious that {T1,T2} is the type of a tuple, though I guess I can get my head around it.

The idea does require some getting used to if you're used to the current Julia syntax. It's more intuitive when you think about it in relation to the role value tuples play in function application:

f applied to arguments (1,2,3)f(1,2,3) T applied to parameters {A,B,C}T{A,B,C}

It could also really cleans up syntax that currently uses Val types (or some similar wrapper type), which can come into play when writing generated functions for type-stable transformations over heterogeneous tuples (but I digress, discussion regarding the tuple type change should probably stay in #8470).

ScottPJones commented 9 years ago

I also think vectors really are more fundamental than dictionaries. How are dictionaries implemented after all?

I think that is the wrong question. We are talking about the syntax and concepts here, not implementation details. If you think of it from a conceptual viewpoint, associative arrays (aka dictionaries) are more fundamental than integer subscripted vectors or arrays (Lua is very nice that way, as is CachéObjectScript and M/Mumps). What is a vector, but an associative array with the keys restricted to integers?

Also, why do dictionaries even have to be implemented with vectors? (unless you really want to get down to the nitty gritty, where the entire memory of the computer is a vector of bytes). In COS, globals (persistent, distributed, atomic) associative arrays were implemented with B+ trees, and local associative arrays with a variety of structures (p-tries, vectors that stored the base index and span and allowed for missing values [arrays that had only had integer subscripts yet], hash tables), whatever was most efficient, but all invisible to the programmer.

malmaud commented 9 years ago

If the blessing that is generic programming allow lists of pairs to realistically work in most contexts that dictionaries currently work, that seems like it would be a pretty good solution.

mdcfrancis commented 9 years ago

@malmaud and from the experiment I'm doing at the moment, you can trivially implement the associative methods for Vector{Pair{K,V}}, so perhaps the real issue here is that the deprecation should not have been such and should have been a switch to the Pair vector syntax with a thin shim which supports associative like behavior, which is often cheaper / smaller for small collections.

mdcfrancis commented 9 years ago

@JeffBezanson - how would you feel about a PR for that? e.g. go directly to the pair syntax ?

mbauman commented 9 years ago

If the blessing that is generic programming allow lists of pairs to realistically work in most contexts that dictionaries currently work, that seems like it would be a pretty good solution.

That would be cool for it to work as a linear-search "dictionary", but unfortunately there's a clash of meanings with numeric keys.

d = [Pair(2=>3), Pair(3=>4)]
d[2] == (3=>4) # Is it the second element?
d[2] == 3      # Or is it the key lookup?
mdcfrancis commented 9 years ago

@mbauman it would be the key lookup, but I agree that is an odd case

mbauman commented 9 years ago

Then it's no longer a Vector of Pairs. Here's the trouble:

d = Any[Pair(2=>3), Pair(3=>4)]
d[2] == (3=>4)
d = [Pair(2=>3), Pair(3=>4)]
d[2] == 3

This would be bizarre. If inference at some point fails to concretely type an array comprehension, your data structure now behaves extremely differently.

ScottPJones commented 9 years ago

In 0.5, will x = [ :a => 1, :c => [ 2, 3] ] give me a Vector{Pair{Symbol,Any}}? and typeof(x[2]) gives me Pair{Symbol,Vector{Int}}? (really, it gives Array{Int64,1} instead of Vector{Int}, but they are ===).

That is what I'd expect.

@mdcfrancis I don't agree with it doing a key lookup instead of returning the Pair. Instead, I'd have a Dict constructor that converts a (possibly nested) vector of pairs into a Dict of the right type, and returns it.

I.e. x = Dict([:a => 1, :b => [2,3]]) returns something of type Dict{Symbol, Any}, where x[:a] returns 1, and x[:b] returns [2,3].

How about that? Syntax is easy, doesn't change any proposed 0.5 syntax, and gives easy to read associate array literals.

malmaud commented 9 years ago

@mbauman Ya, all I was thinking of really is that functions that expect a dict, f(d)=something(d), would instead look like f(d)=something(asdict(d)). Define asdict(d::Associative)=d and asdict={T<:Pair}(x::Vector{T})=Dict(x) (or some light-weight dict alternative that has that key-value semantics).

ScottPJones commented 9 years ago

@kmsquire, why did you bother with all of those \" to quote the JSON string? Just use """ instead, and you don't have to change them (except watch out for $'s in the text!)

ScottPJones commented 9 years ago

@malmaud Sounds like we are thinking on exactly the same lines.

stevengj commented 9 years ago

@nolta, I don't think the .+ vs. + transition is comparable. Requiring .+ for array + scalar was problematic because + is extremely well established syntax for this operation in scientific computing. Also, using + required no special support in the Julia parser, only ordinary method overloading. Whereas Julia's old Dict syntax is neither universal nor implementable without special parser support.

mdcfrancis commented 9 years ago

The proposal is no worse than what exists in 0.3 today which a lot of people were happy with ( pairs convert to dictionary ). As @mbauman points out the ambiguity is for integer keys, for the rest of the universe of types the behavior would be consistent (key based lookups with linear performance). We could (if required) special case integer so that it does not perform the key lookup (probably a good idea).

This would not solve the case where a function is expecting an associative which seems like the main reason not to do this. We would have to go through the code and change API points to accept Vector{Pair} or Associative - I suspect this is less work than changing all the usage of { pair } and [ pair ] though and would be inline with the future direction.

jrevels commented 9 years ago

I don't think changing the indexing behavior of a Vector based on simply on its eltype is a good idea...something with type Vector should behave like a vector, not a dictionary. If you want it to behave like a dictionary...well, that's what Dict is for.

mdcfrancis commented 9 years ago

You are probably correct, though this does not change the indexing, it just extends it. A Vector{Pair{String,Any}} would still behave like any other vector you can push elements onto it, you can reference by integer index etc. Just that when you indexed it by a String the lookup would be on the contents of the element.

kmsquire commented 9 years ago

@kmsquire, why did you bother with all of those \" to quote the JSON string? Just use """ instead, and you don't have to change them (except watch out for $'s in the text!)

True. That example was copy-pasted from the JSON.jl tests, and whoever wrote that originally probably wasn't aware of """ at the time.

mdcfrancis commented 9 years ago

@one-more-minute suggested the following macro for supporting direct JSON syntax in Julia.

https://groups.google.com/d/msg/julia-users/1bwx3fjSO5A/V_inIa7eCAAJ

I'm also looking forward to having vectors of pairs as soon as is practical in the next version.

These two items would remove my objections to the removal of the terse syntax (as it would still exist for my purposes :) ) . At what point do we think we will be able to remove the backward compatibility from the [] syntax?

ScottPJones commented 9 years ago

I do love how this discussion led to a reasonable solution for @mdcfrancis (and certainly others, including myself), within less than 24 hours. It might have seemed like pointless complaining at first to some people, but look at the results.

mauro3 commented 9 years ago

Backwards compatibility will be removed in 0.5, after one release cycle of deprecation warnings.

mdcfrancis commented 9 years ago

@ScottPJones - agreed. @mauro3 should we close this issue and open a concise description with a 0.5 tag so that it is rembered ? We can place the link back to here.

JeffBezanson commented 9 years ago

That really is a nice macro. Good example of a situation where a macro is a good solution.