JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.69k stars 5.48k forks source link

implement immutable types #13

Closed StefanKarpinski closed 11 years ago

StefanKarpinski commented 13 years ago

The conclusion of our last discussion on immutability versus mutability was that it is largely a psychological distinction:

To declare that a new concrete type is immutable, prefix its declaration with immutable:

immutable type Complex{T<:Real} <: Number
  re::T
  im::T
end

Alternately, only the immutable keyword could be used:

immutable Complex{T<:Real} <: Number
  re::T
  im::T
end
ViralBShah commented 13 years ago

I was unaware of the importance of this until Stefan pointed this out to me:

julia> a = Complex(1,1)
1 + 1im

julia> b = a
1 + 1im

julia> a.re = 2
2 + 1im

julia> a
2 + 1im

julia> b
2 + 1im
StefanKarpinski commented 13 years ago

Note: I proposed that Immutable could be an abstract type in this email thread. Jeff seems hesitant about this idea and no conclusion was reached.

JeffBezanson commented 13 years ago

I have a plan. Complex numbers are important for 1.0; we especially need them for FFTs and the LAPACK interface. And we need to at least match the capabilities of matlab, octave, fortran, etc. Therefore I'm going to try to define Complex64 and Complex128 as bits types, which will give us the packed arrays, immutability, and performance we need. Complex{Foo} can wait for 2.0.

StefanKarpinski commented 13 years ago

Ok, that's definitely a plan. I think the packed arrays are much more important than the immutability. For v1.0 we can just say "don't mutate complex numbers" and leave it at that. Any chance we can do it that way?

JeffBezanson commented 13 years ago

Yeah, this way complex will work like other numeric types, in julia as well as other languages. You won't be able to mutate it or access fields with dot syntax. And we can already stack allocate bits types, so we could get some great performance. Only downside is you can't have complex with arbitrary component types, but that doesn't have many use cases. But, it's easy to define new complex types in julia.

StefanKarpinski commented 13 years ago

Can we can keep Complex{T} and have Complex64 and Complex128 as special types that implement the same basic functionality but more efficiently? Also, should those be called Complex32 and Complex64 instead? I know they have twice that many bits, but you could argue that the naming is after the size of each part, not the total.

JeffBezanson commented 13 years ago

Yes, we can keep it (maybe at the cost of some confusion), and the complex math routines can be made generic enough. I think the names should refer to the total size of the type, like complex*16 in fortran.

BinarySplit commented 12 years ago

I feel there is a need for this, if only to restrict hashing to immutable values in the same way that Python does.

Take this example:

julia> a = [1]; b = [2]; c = {a=>"one", b=>"two"}; a[1] = 2; b[1] = 1; c
{[1]=>"two",[2]=>"one",}

julia> c[[1]]
key not found: [1]
in ref, /home/lachlan/julia/j/table.j:19

julia> c[a]
key not found: [2]
in ref, /home/lachlan/julia/j/table.j:19

julia> c[[1]] = "ONE"; c[[2]] = "TWO"; c
{[1]=>"two",[2]=>"TWO",[2]=>"one",[1]=>"ONE",}

If the hash function is only defined for immutable types by default, then potentially confusing/error-prone situations like this could be avoided. I would be forced to use tuples instead of arrays for my hashtable keys.

Apologies in advance if I am joining an already decided conversation. I can't see the linked discussion threads in the julia-math Google Group as I am not a member.

JeffBezanson commented 12 years ago

Yes, technically you are right on this. I'm familiar with http://home.pipeline.com/~hbaker1/ObjectIdentity.html.

Computing hash values of arrays, and comparing arrays both make sense, but not using them as table keys. In fact the only use that comes to mind is memoizing certain kinds of functions. But if a memoizer copied arrays before using them as keys, it would be safe in practice. Disallowing that seems too strict.

StefanKarpinski commented 12 years ago

There's also issue #359, which proposes using const fields to express fine-grained immutability. I fully immutable object would be one in which all fields are const. I think that copying non-const objects upon insertion into a hash is a pretty reasonable way to go. Of course, you can still iterate the keys and then modify the copied key, which might be even more confusing. In order to really solve this, we would need immutable versions of all types :-/

khinsen commented 12 years ago

I believe that immutable data structures are very important for writing robust code and become even more important when parallel programming is taken into account. The functional programming community has made this point quite clearly.

What I think a modern programming language should have is the possibility to define fully immutable data structures, including immutable container types, with the possibility to test at compile time (through the type system) and at runtime if a given piece of data is immutable. Immutable container types can be implemented efficiently using persistent data structures (check out the Clojure language (http://clojure.org/) for a real-life implementation). A fully immutable data structure is an immutable whose fields (for structure-like types) and/or elements (for container types) are immutable as well.

The advantage of immutable data structures is that they can be safely passed around between functions and concurrent tasks, stored in files or sent over the network, without anyone being able to mess them up, not even accidentally. For an extended argumentation, see http://doi.ieeecomputersociety.org/10.1109/MCSE.2012.11 (ask me for a copy if you can't access it).

staticfloat commented 12 years ago

Hey there, stumbled across this exact issue...... Perhaps I don't quite understand the bitstype hierarchy of complex numbers, but I can still do this:

julia> a = 1 + 1im
1 + 1im

julia> b = a
1 + 1im

julia> a.re = 2
2

julia> b
2 + 1im

The Complex64/128 bitstypes don't have an re field, so this problem is dodged. Why does 1 + 1im yield a ComplexPair instead of a Complex64/128?

vtjnash commented 12 years ago

@staticfloat Perhaps this will help:

julia> super(Complex64)
Complex{Float32}

julia> typeof(1+1im)
ComplexPair{Int64}

julia> typeof(1.+1im) #note the extra dot that promotes this to float arithmetic
Complex128

In summary, Complex64/Complex128 are Float types. There is no currently available analog for Complex integers, except for the fallback ComplexPair type.

staticfloat commented 12 years ago

Ahhh, I get it. Complex64 is really a Floating-point type... And ComplexPair is the fallback where you just have two arbitrary datatypes (such as integer) in real-imaginary pair, and this issue is tracking making compound types like ComplexPair immutable. Thanks, @vtjnash.

StefanKarpinski commented 11 years ago

I just wanted to document the current thinking that Jeff and I cooked up when discussing this issue recently. The type keyword will introduce a new immutable composite type, and reftype will be the keyword for introducing a new mutable composite type. The semantics are that reftypes have a well-defined location and what you're passing around is really just a reference to that single location, hence the name. Normal, immutable types, on the other hand, won't have a well-defined notion of location – they are just values, and the compiler can move or copy them freely. The naming is intended to make people think of values as the normal kind of type, as they should be for most things. There will be no way to make just some of the fields of a type mutable. Instead, you can just have an immutable value that has a field that's of a mutable type. This reduces the number of features and will probably force better design.

nolta commented 11 years ago

Sounds good, but i'd rather we leave type alone, and call immutable types value. Most users are going to expect types to work like structs.

StefanKarpinski commented 11 years ago

We kind of want to force people to use immutable value types as much as possible. Basically unless you actually have a need for something that has a well-defined location, you should be using an immutable value type instead. This is the sequence I'm hoping for:

  1. break someone's expectation a bit when they try to mutate a composite type and can't
  2. they read a bit about why they can't do that
  3. they read about reftypes which can do what they want
  4. they think a bit and realize that they can actually do what they need with an immutable type
  5. they achieve enlightenment.

If type was mutable, people would just go ahead using those and never be enlightened since none of this would ever happen and we'd have to keep prosthelytizing futilely for immutability for the rest of our lives.

nolta commented 11 years ago

That's crazy. If immutable types are useful, people will use them. This is the sequence i fear:

  1. user code breaks
  2. after banging their head against the wall for a few hours, they discover we changed the meaning of type
  3. white-hot "WTF!? Why did they change that!!1!" rage ensues
  4. abandons julia, goes back to python
StefanKarpinski commented 11 years ago

I'm not overly concerned about that. We've been advertising that we're going to do this forever :-)

timholy commented 11 years ago

Can I ask for some clarification? Let's consider a Range{Int}, which I'm guessing will be a candidate for being an immutable type. Let's also suppose A is a matrix (two-dimensional). I can say r = 1:stride(A,2):length(A) and then use r to iterate along the first row. Once I've done the first row, currently I can move on to the second row with r.start += 1, and I don't have to copy the step or len entries. IIUC, making Range{Int} immutable will break this behavior, right? That is, having the type be immutable is a promise to the compiler that none of the fields will change their values.

Or, is it less stringent than that? Is "immutable" basically the promise that a CompositeKind can be effectively represented by a single pointer, and (as in C) p->a is always at a defined (compiler-known) offset relative to *p, so that you can avoid the need for a lookup table of field pointers for each instance of an object? (And sizeof will work for it.) That is, for these two types,

type Type1
    a::Int64
    b::Float64
end
type Type2
    a
    b::Float64
end

the first could be an immutable type with no change in behavior (i.e., I'd still be allowed to say t1.a = 5), but the second would not be "immutizable" because you don't know anything about the type of a and it indeed might change on you.

Or none of the above?

I'm just trying to understand; ATM I'm not saying that the first "more stringent" is horrible, because (taking the Range example) I can imagine there might be compiler optimizations possible in an inner loop that knows that the step is a constant. So I'd get an overall performance boost even if it feels wasteful to create a whole new Range value for each row. But it certainly would break more code, and therefore might lead to some distress.

StefanKarpinski commented 11 years ago

Yes, it's the strict version. If Range is an immutable type then doing r.start += 1 is an error. Which I think it should be because that's a pretty weird thing to do. (We aren't doing that in our code somewhere, are we?)

JeffBezanson commented 11 years ago

Changing type at this point feels gratuitous to me too. Surely educating/convincing people about immutability does not simply hinge on the choice of keyword.

@timholy it's the most stringent version. But what happens is that we can then turn "creating a whole new Range" into not actually allocating a new object. Ideally code that used to do r.start+=1 won't get any slower (when rewritten not to use mutation), plus code that used to repeatedly allocate will get faster.

StefanKarpinski commented 11 years ago

I should point out that LLVM (like other compilers) is really good at optimizing away copies of immutable values. When you do something which, in principle, copies a Range object with one field modified, it will very often actually just be modified in place. Even more importantly, it is quite possible that the Range object doesn't actually live anywhere in memory – there is no contiguous Range object allocated, but rather pieces of it can just be stored in registers and never saved to memory at all. You can't do this (easily) with mutable values since they need to live somewhere in memory in order to support mutable semantics.

StefanKarpinski commented 11 years ago

So valtype? immutable? Using value as a keyword seems like a bad idea.

timholy commented 11 years ago

Thanks for the explanations. If type doesn't change it's meaning, then I think there's nothing of concern to discuss. Carry on.

@StefanKarpinski, I don't know of a place where it is happening in Julia's main repository now. It's more of a C-like thing to do: I've got one example like that in my Grid module (to be released once I get to the "finish line" with HDF5, aaargh), where a Range is basically being used as a cleaner interface to Blas.axpy!. But it's not a necessary interface, and there might even be some small advantages in using axpy's raw (but not pretty) interface. So it's not a concern. Now I know not to bother with such constructs.

I did just add a valtype function to dict.jl. Feel free to change if you want to co-opt it as a keyword.

kmsquire commented 11 years ago

I was looking at the range code recently, and it is mentioned in the comments that ranges are meant to be immutable. Of course, the comment doesn't actually prevent the type from being mutated. ;-)

Question: are there plans (tentative or otherwise) for immutable container types as well?

I, for one, welcome our new non-mutating type overlords... err, I mean, immutable types look interesting.

johnmyleswhite commented 11 years ago

immutable sounds like the best name to me. Like @timholy, I don't have much intuition for how this will actually work. So I'll be one of the people you'll be educating about the virtues of immutability.

timholy commented 11 years ago

Another thought: I'm guessing this means that immutable types are not, as I once thought, part of the plan for a built-in, zero-copy version of strpack. Assuming this is something Julia will eventually have, instead you're planning to implement this functionality by enhancing mutable types. However, it will be an error if you try this on a type for which the memory layout is not uniquely specified by the type declaration. Correct?

kmsquire commented 11 years ago

The longer something is to type, the less people will want to use it frequently. type is nice. valtype is okay. immutable might be a bit long.

I'll actually vote for valtype, since it shows a direct relationship to type. The struggle here is getting users to think of values as immutable.

kmsquire commented 11 years ago

As an aside, I really like the idea that julia is borrowing so many good ideas from other languages. The idea of mixing immutable and mutable types in a language (and strongly encouraging immutable use) was one of the things I liked about scala early on, before it got (or before I understood it to be) all convoluted(*). After this, all I'd like to see is better pattern matching.

() scala wasn't the first to do this (), of course. () by "this", I mean, "mix immutable and mutable types", not "get all convoluted". Many languages (cough *R cough) have done that.

JeffBezanson commented 11 years ago

Ok, done :)

StefanKarpinski commented 11 years ago

PARTY TIME!!1!¡!

timholy commented 11 years ago

I am really looking forward to this. So much so that I wonder if I'll have to talk myself into supporting 0.1 in Images.jl...

timholy commented 11 years ago

Any noticeable bumps in performance on benchmarks? Or is it too early to see many of the benefits?

Nice job, Jeff!

toivoh commented 11 years ago

This makes me feel like coding :)