Closed my-little-repository closed 9 years ago
What type of 0 would you like back? Int
, Uint8
, Float32
, or something else?
There is no zero(Any)
, and Any
is the element type of []
. I should point out that
julia> sum(Float32[])
0.0f0
So all you have to do it make sure your arrays are typed.
Nitpick: I think the element type of []
is actually None
.
You are indeed correct!
Argh I dislike *
on strings more all the time. *()
should not be defined, or at the very least shouldn't return a string. There used to be a *() = 1
, and removing it exposed this behavior.
+() = 0
was nice in some cases, but doesn't really work in general since it depends on what group you're working in.
So, here is a fun thought:
*() = true
+() = false
*(x::String,b::Bool) = b?x:error()
@timholy Is it really a problem? 0 and 1 play well with all the number types. I think that Float64 was the default type for numerics on 64bit machines. A lot of stuff was silently promoted to Float64 in the previous version. I felt that by default, julia was targeted at numerics using the basic underlying type of the machine. Is it now taking a more symbolic approach?
Also, thanks for the typing advice, unfortunately it does not work for mapreduce.
julia> mapreduce(x->whatever,+,Float64[])
ERROR: Reducing over an empty array is not allowed.
If you want to return a number-type-agnostic result, maybe it is possible to define a mathconst 0 and a mathconst 1, like pi or e. 0 is certainly as constant, and as remarkable, as pi. Or create a new type for these remarkable numbers.
It would be nice for mapreduce
to do whatever reduce
is doing:
julia> sum(Int[])
0
julia> reduce(+,Int[])
0
julia> reduce(*,Int[])
1
julia> mapreduce(identity,+,Int[])
ERROR: Reducing over an empty array is not allowed.
in _mapreduce at reduce.jl:154
in mapreduce at reduce.jl:176
Can't be too sure offhand, but it seems like that should work.
Is it really a problem? 0 and 1 play well with all the number types.
Not really:
julia> using SIUnits
julia> x = 2Meter
2 m
julia> x + 0
ERROR: Unit mismatch. Got (m ) + ()
in error at error.jl:21
in + at /home/tim/.julia/v0.3/SIUnits/src/SIUnits.jl:182
in + at promotion.jl:158
julia> isa(x, Number)
true
Your MathConst idea is indeed clever and would be effective, but it has a major downside: it would make all of the reduction functions type-unstable.
Bummer about mapreduce though. One would want that to work. Of course, one subtlety is that if f(x::T1)::T2
, but you don't have any elements to test out f
on, then you have to rely on the results of type inference. Which I think we don't do right now.
And of course type-instability also applies if you return 0::Int
or 0.0::Float64
.
@timholy But this is the desired behaviour with SIUnit, no? You want to return an error if meters are added to some number which has not been explicitely typed as meter. The goal here is to prevent conversion errors and that works.
When it comes to the mathconst idea, it may not be so clever, I just remembered that pi is not handled everywhere e.g. [https://github.com/JuliaLang/julia/issues/7994].
I guess what is needed here for +() is a 0 of type number. And then adding it to something would simply return that something (fast! Actually can't be faster and is type-safe). Note that zero and one are the only candidates to be instances of the abstract type Number since they are the only one having representatives in all the related concrete subtypes. But we can't instantiate an abstract type...
Or a zero with methods
julia> methods(0)
0::Int64 at mysterious.jl:2718
0::Uint8 at mysterious.jl:314
0::Float64 at mysterious.jl:1.0
etc
I still think at the moment that the most practical approach is to return the underlying machine type.
There is a zero
function, but it can't be defined for all types. For example, what is zero(Array{Int,2})
----how big should the matrix be?
But I think the bottom line is that if mapreduce
can be made to mimic the behavior of the others, you'd be happy, right? Care to look into how the others work and see if you can port the behavior over to mapreduce
?
@timholy You're right about the reason for the mapreduce
behavior.
How about an optional argument of mapreduce that gives the fallback value for empty iterables / a starting value for the reduction? That should let the user handle most cases where this is an issue.
Ah! Toivoh's suggestion provides an easy fix to the problem.
julia> mapreduce(x->2x,+,[])
ERROR: Reducing over an empty array is not allowed.
in _mapreduce at reduce.jl:154
in mapreduce at reduce.jl:176
julia> mapreduce(x->2x,+,0,[])
0
So I can specify the (neutral) value for the empty list just by providing a starting argument and recover the previous behavior. This is a minimal change. Thanks. Maybe this could be explained in the v0.3 News file?
@timholy zero(Array{Int,2}) returns an error and this is fine. The point is that 1 should be returned for the * operator. op() was a nice way to specify the neutral element for the operator. Anyway, I can live with the current situation. I still think that this is an unnecessary breakage in v0.3 that could have waited v0.4.
@my-little-repository 0.3 has multiple breaking changes, there was no reason to wait.
I also notice you keep using []
- you should be using {}
or Type[]
, []
never comes up in practice.
@IainNZ nitpicking but you can grep your julia files and find many occurrences of []
(45 for me; you want it in your test files for example, but you can find it also in lapack.jl or array.jl).
Anyway I hope it did not obscur the fact that my complaint was about the new behavior of mapreduce breaking production code (that uses typed arrays) in the v0.3 stable release, not about [] specifically.
By the way, it seems that mapreduce (and probably reduce) has become slow on some benchmarks. I get
julia> @time mapreduce(big,*,1:20000);
elapsed time: 0.308556927 seconds (305406800 bytes allocated, 64.66% gc time)
Here is what I had a few months ago
julia> @time mapreduce(BigInt, *, 1:20000);
elapsed time: 0.040044266 seconds (3167912 bytes allocated)
Unfortunately I don't remember what version I was using and I uninstalled v0.2, so I can't be sure. I will try to reproduce and submit a report if it is confirmed.
@timholy It would indeed be nice if mapreduce was made to mimick the behavior of reduce when evaluated on an empty array.
@my-little-repository, are those uses of []
typed or untyped? Iain was just pointing out that untyped []
is equivalent to None[]
in Julia, which is pretty useless (it can hold all of the None
values you want, and nothing else).
If you see any uses without type annotations in front on base, can you open up an issue or PR? Thanks!
That's a pretty major performance regression. It's mostly here. Has something about BigInts changed? I know nothing about them personally.
@kmsquire Untyped, I think. Here is the list I get from running M-x rgrep RET " \[\]" RET *.jl RET ~/path/to/source/directory RET
in emacs (you can also exec the following form in the scratch buffer
(let ((sources "~/path/to/julia/sources/"))
(rgrep " \\[\\]" "*.jl" sources))
assuming you're using emacs and grep is installed. Actually this could be done with a simple julia programme but I don't see a facility in base to easily recurse through a directory.)
This is with an install of julia yesterday from source. Version 0.3.1-pre+4 (2014-08-22 11:13 UTC)
./test/perf/kernel/json.jl:164: val = []
./test/linalg3.jl:211:@test diag(A,-4) == []
./test/linalg3.jl:218:@test diag(A, 3) == []
./test/linalg3.jl:221:@test diag(zeros(0,0)) == []
./test/linalg3.jl:225:@test diag(zeros(1,0)) == []
./test/linalg3.jl:226:@test diag(zeros(1,0),-1) == []
./test/linalg3.jl:230:@test diag(zeros(0,1)) == []
./test/linalg3.jl:231:@test diag(zeros(0,1),1) == []
./test/functional.jl:16:# map over [] should return []
./test/functional.jl:18:@test isequal(typeof(map(x -> x, [])), Array{None,1})
./test/functional.jl:32:@test isequal(filter(x->(x>10), [0 1 2 3 2 1 0]), [])
./test/random.jl:3:@test find(x .== rand()) == []
./test/core.jl:1010: p = []
./test/arrayops.jl:665: c1 = mapslices(x-> maximum(-x), a, [])
./test/arrayops.jl:876:@test []' == Array(None,1,0)
./test/ranges.jl:211:@test [0.0:-0.1:1.0] == []
./test/ranges.jl:212:@test [0.0:0.1:-1.0] == []
./test/ranges.jl:224:@test [0.0:-1.0:0.5] == []
./test/collections.jl:392: ("hello", 23, 2.7, (), [], (1,8)))
./test/statistics.jl:249:@test hist([])[2] == []
./test/strings.jl:509:@test isequal(split("", ',', false), [])
./test/strings.jl:510:@test isequal(split(",", ',', false), [])
./test/strings.jl:511:@test isequal(split(",,", ',', false), [])
./test/strings.jl:529:@test isequal(rsplit("", ',', false), [])
./test/strings.jl:530:@test isequal(rsplit(",", ',', false), [])
./test/strings.jl:531:@test isequal(rsplit(",,", ',', false), [])
./usr/share/julia/helpdb.jl:795: julia> a = []
./base/pkg.jl:62:generate(pkg::String, license::String; force::Bool=false, authors::Union(String,Array) = [], config::Dict=Dict()) =
./base/show.jl:562: show_block(io, "catch", is(args[2], false) ? [] : args[2], args[3], indent)
./base/show.jl:565: show_block(io, "finally", [], args[4], indent)
./base/linalg/lapack.jl:3360: &jobvs, &'N', [], &n,
./base/linalg/lapack.jl:3363: &lwork, [], info)
./base/linalg/lapack.jl:3404: &jobvsl, &jobvsr, &'N', [],
./base/linalg/lapack.jl:3408: &ldvsr, work, &lwork, [],
./base/linalg/lapack.jl:3450: &jobvs, &sort, [], &n,
./base/linalg/lapack.jl:3453: rwork, [], info)
./base/linalg/lapack.jl:3495: &jobvsl, &jobvsr, &'N', [],
./base/linalg/lapack.jl:3499: work, &lwork, rwork, [],
./base/abstractarray.jl:794:hvcat(rows::(Int...)) = []
./base/dSFMT.jl:589:for (lhs, rhs) in (([], []),
./base/LineEdit.jl:85:complete_line(c::EmptyCompletionProvider, s) = []
./base/LineEdit.jl:639:default_completion_cb(::IOBuffer) = []
./base/replutil.jl:65: print(io, "Use square brackets [] for indexing.")
./base/array.jl:592:const _default_splice = []
./doc/helpdb.jl:795: julia> a = []
This looks fine to me, I am not sure why []
is frowned upon. Maybe in array.jl the const _default_slice
should be typed? hvcat returning []
may be suspicious also. But then what should it return? {}
?
julia> hvcat(1)
0-element Array{None,1}
julia> [[],[1,3,3]]' # nice
1x3 Array{Int64,2}:
1 3 3
julia> [{},[1,3,3]]' # not so nice
1x3 Array{Any,2}:
1 3 3
Should I submit an issue?
You can't add any elements to []
which means it's a bit limited and unintuitive. We are, however, probably going to make []
create an empty Vector{Any}
and also make comprehensions (and map, etc.) produce arrays whose element type is the typejoin of their values. That will make []
growable and thus much more useful.
@my-little-repository You'll notice that the vast majority of the occurrences are for equality checks in tests, which is fine since all empty sets are equal, disregarding the type of their elements. There are some occurrences which would need further examination, though.
@timholy Concerning mapreduce(big,*,1:20000)
, if it is using the sequential algorithm, then it is necessarily slower than what it used to be. In previous versions (at least until may: 0.3.0-prerelease+3303 (2014-05-28 19:23 UTC)) it was using the pairwise algorithm which performs far better on that test.
In the sequential algorithm ((((1*2)*3)*4)...)
, the switch to BigInt values occurs at the twentiest iteration. So most products are products with a BigInt (slow). In the pairwise algorithm ((1*2)*(3*4))*((5*6)*(7*8)))
the large majority of products are products of Int64 (fast). The recursive pairwise algorithm beats loops as soon as the cost of op(a,b) increases during the computation.
I will open separate issues for the mapreduce performance regression and the untyped [] check.
I just upgraded to julia version 0.3.1-pre+4. Now
sum([])
,prod([])
,reduce(+,[])
,mapreduce(x->0,+,[])
all throw an error whereas a sound value was returned in julia v0.2 (namely0
,1
,0
and0
).As far as I understand, this comes from the fact that these functions are based on mapreduce and that mapreduce now throws an error on the empty array. Previously, it was returning
op()
, which was very nice since+()
returned 0 and*()
returned 1, which gave exactly what was needed/expected.Now
+()
returns an error whereas*()
returns""
. I don't understand why strings were privilegied over numbers here. Julia is targeted at numerics after all. I really would like the previous behavior back. I must now check manually that the array argument is non-empty every time I use sum,prod, reduce... It is both tedious,ugly and error prone (everybody will forget to check their sum). And we can always return a meaningful error ifop()
is not defined.