Closed lmshk closed 8 years ago
fwiw, I would expect Hmm, on the other hand, with the current behaviour, you know how many elements you'll get out, and it's not really too much trouble to domap(f, Set{T}([a,b, .., z]))
to return Set{T}([f(a), f(b), ...f(z)])
, but there may be special considerations I am not aware of.
S = Set([1,2,3])
mapS = map(x-> x^2, S)
Z = Set{Int64}(mapS) # or whatever you want
# or more compactly
Z = Set{Int64}(map(x-> x^2, S))
I agree that we should use the same logic for the element type of the output container that we do for Arrays (ie, use the tightest type that can fit all the elements).
For the other point, I think I agree that you want map to send sets to sets, even though there is no formal reason for that. That's more up for discussion I guess.
Insofar as map
transforms strings to strings and tuples to tuples, it seems pretty clear that it should map sets to sets.
Well actually, you would end up losing the information of how many elements in the original set were mapped to a given element in the transformed set.
Some references:
https://github.com/JuliaLang/julia/pull/6844
Proposed new framework for map
:
https://github.com/JuliaLang/julia/pull/15146
Need to decide how to generalize it to non-arrays.
And a similar issue regarding dicts: https://github.com/JuliaLang/julia/issues/5794
I'll echo others: I think it's desirable for map to take sets to sets. Certainly this was the behavior I intuitively expected. But more than that, in math notation like {f(x) | x \in S} is very common and represents a set; the most natural way to express this in Julia code would seem to be map(f, S)
.
Edit: below Stefan points out the new generator syntax, which is perhaps a more natural way of expressing the set comprehensions and thus weakens my argument that map should obviously take sets to sets.
It would also be nice to get detailed typed sets out, not Set{Any}. Naively I would think that if map applied to an array doesn't lose type information, neither should map applied to a set. See this recent post (mine) on julia-users for an example:
https://groups.google.com/forum/#!topic/julia-users/qyDWkutI9AY
Note that with the recently introduced generator syntax, you can express set comprehensions like so:
Set(f(x) for x in S)
That is very close to the mathematical set comprehension notation. (Historical note: we originally used |
instead of for
in set comprehension syntax, but since |
is an operator, that was a bit of a parsing nightmare.) This isn't an argument against making map(f, S)
produce a set when S
is a set, just an observation.
There seems to be an agreement that map
on Set{T}
should result in a Set{T}
. But what about my other points? My understanding is that whatever is being worked on right now (i.e. the new framework for map
) will not be integrated into 0.4; since that is the stable version for some time to come, should its documentation be updated to clarify the current expected behavior?
Feel free to make a pull request with suggestions to improve the docs. But it's not that easy to do since the behavior is currently inconsistent: sometimes the type is preserved (like for tuples, strings, bit arrays), but the fallback is to return an Array
, and several core types do that. I guess you could simply mention that fallback, without listing the types for which it applies.
This should definitely be documented for 0.4. For the future, having map on set produce a set seems like reasonable behavior, but I'm much more concerned about having a coherent overall policy. So the question is what is the general behavior of map on all kinds of collections and how can we implement it generically. The issues @JeffBezanson referenced are working towards such a framework.
Ok, I will make a pull request for the documentation change when I get back from vacation next weekend.
Regarding the general behavior of map
: I think it is difficult to have it work on any iterable. For instance, what should be the result of map(x -> x * x, 1:10)
where the result cannot possibly be of the same (outer) type? As long as map
does not always return a generic iterable, I would argue that it is better to raise an error in this case, because silently changing the type can lead to non-obvious problems.
As long as map does not always return a generic iterable, I would argue that it is better to raise an error in this case, because silently changing the type can lead to non-obvious problems.
That would be terribly annoying. Better continue returning an array in that case. I don't see it as a problem since this behavior is quite predictable: it only depends on the type of the input, so it's unlikely to go unnoticed during development/testing.
Disallowing map(f, 1:10)
is not great, since to get around that you'd have to expand the range to a dense vector, which is unnecessarily inefficient.
collect(f(x) for x in 1:10)
is another way around that
It does make some kind of sense to map sets to sets, but I too wonder which kind of collection types would get this special treatment and which would continue to produce array results. It would be nice to say that we currently have a simple rule that map
always produces an Array
, but there's actually already some exceptions, e.g.:
map
on tuples returns tuplesmap(f, s)
on strings returns strings and errors out if f
does not return chars - not so sure if this is desirable thoughmap(f, as...)
on BitArray
s returns a BitArray
if f
returns boolean valuesBut sets are even more different in that they are not ordered or indexable. Maybe there should be another function for mapping an iterable into a set.
It would have been nice to have a version of map
where one could specify the output container type. If it would be filled with push!
then it should work for sets as well.
It could be useful to allow specifying the return container type to map
, but by default I think it makes sense to preserve the type, will Array
as a fallback when it's not possible (e.g. ranges). We could also make the behavior for strings more consistent with that for BitArray
: when the function does not return a Char
, map
should just return an array.
It's important to have a map
function (or equivalent) that preserves the container type. There are containers that hold additional information, not just the elements they contains, and returning an array loses this information. For example, a tree has a particular structure, or there could be a wrapper for an array that keeps access statistics, or remembers the minimum and maximum of its elements. Regenerating this additional information might be expensive or impossible.
Finally, there's the obvious case to be made for map!
that doesn't change the container type. Introducing a difference between the return types of map
and map!
seems strange. If there's a need for a map
-like function that always returns an array, then it should be called mapcollect
.
I'm just wary of all the trouble we've had with trying to preserve array types for array operations using similar
, which has turned out to be a not very well defined operation. But since map always maps elements to elements I suppose it's a bit simpler.
I'm surprised to hear that similar
is difficult to implement. I'm curious. Can you elaborate, or point me to a discussion or PR?
@eschnett You missed that debate? See https://github.com/JuliaLang/julia/issues/11574 and linked issues. Have a good read! :-)
I see. I've encountered the same problem at other occasions. My solution
was to define, for each collection, the inverse of eltype
. C++ containers
also have this functionality. I sometimes call this function tycon
, which
is haskellese for "type constructor": It takes the element type as input
and returns the collection type.
So tycon(Array, Nullable{Int})
would return NullableArray{Int}
etc.
One could wrap this into the definition of similar
, which would mean that
each collection type might need to overload it.
-erik
On Saturday, March 5, 2016, Milan Bouchet-Valat notifications@github.com wrote:
@eschnett You missed that debate? See #11574 and linked issues. Have a good read! :-)
— Reply to this email directly or view it on GitHub.< https://ci5.googleusercontent.com/proxy/7y6i1vM2wVgpqHvnw7ZAq7LX403acda7R-7Pxu0CW_vHssrjbGmnCeb4SgVcMZZvxtU6uN8wmGGaOmJcM5pHsxyQwMV7HnkZ3j9Szzy-H-KzTZp7I7gz6a8ksoHCNGiMiWIETa7yY53lab8toH7S9G7FYmuCcA=s0-d-e1-ft#https://github.com/notifications/beacon/AANCCqd7YmIoMPHPIvAFhGkiUtA27dsVks5pqbadgaJpZM4HoeJv.gif
Erik Schnetter schnetter@gmail.com http://www.perimeterinstitute.ca/personal/eschnetter/
Another thing to consider is how to handle map(f,a,b)
where a
and b
are of different container types. It seems to me that this should be disallowed completely for sets, since they are unordered. Otherwise, I guess we could promote the container types together, if we first remove the element type parameters from them. Hmm, feels like this is getting complicated pretty fast...
I'm not sure it will be easier with map
's dynamic approach to determining the output element type either.
The simplest case of a map
with multiple arguments is zip
. It should be possible to implement zip(a,b)
as map((x,y) -> (x,y), a, b)
, or map(f, a,b)
via zip
(as map(f,a,b) = map(xy->f(xy[1],xy[2]), zip(a,b))
.
Personally, I favour the restriction that all collections passed to map
must have the same type and same structure (i.e. same array shapes etc.). However, zip
is already more generous than that.
Yes, passing multiple sets to map
does not make sense.
This issue seems "fixed"? Although the new behavior have caused other issues. Ref #15789.
I believe this is fixed. And the issues with mapping over Dict
should be fixed by #15789 and #17968, right?
I was surprised to find that
map(identity, Set([1, 2]))
returns anArray
. Further, even if I writemap(identity, Set{Int}([1, 2]))
, the result is anArray{Any, 1}
.The documentation for
map
states:I expected "transform collection" to mean that the result is a collection of the same outer type as
c
. Of course, since "transform collection" is not defined exactly, this is not promised here. But from the documentation, it is a bit difficult to understand what to expect ofmap
s return value.To find out what exactly is going to happen when I call
map
I have to do something like@which map(f, c)
to find out which method will be called, and then I actually need to read the source. In my case (map(identity, Set())
) that ismap(f, iters...) at abstractarray.jl:1092
, which creates aresult = []
and returns it later on. The comment here says "generic map on any iterator" (shouldn't that say "iterable with defined length"?), and this seems to be the fallback case formap
.Proposals:
map
documentation should be changed to something along the lines of "Returns an iterable over the values obtained by applyingf
to each element in collectionc
." (If anyone relies on the result being anArray
in the general case, they should wrap the call withcollect
.)isa(c, Array{T, N})
), these special cases should be listed in the documentation.Set
s.I would be willing to do this myself, but wanted to ask for opinions first.