JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.04k stars 5.43k forks source link

structs #259

Closed StefanKarpinski closed 11 years ago

StefanKarpinski commented 12 years ago

Long conversation with Jeff. Part of the discussion was around structs. We concluded that the main goal here is maximum compatibility with C. A struct is similar to a composite type (which we discussed renaming to class instead of type) except that it has immediate value semantics:

struct Foo
  bar::Int32
  baz::Int32
end

julia> foo = Foo(1,2)
Foo(1,2)

julia> foo2 = foo
Foo(1,2)

julia> foo2.bar = 3
3

julia> foo2
Foo(3,2)

julia> foo
Foo(1,2)

This is kind of like immutability but without not allowing anything. Types like Rational and Complex would be structs rather than classes. However, we don't go as far as preventing reaching in and changing a field value. If you want to do that, you can, but don't. It's rude. So we have immutability by convention.

A couple more things about structs:

Think of Ref like a C pointer in a struct — and it can be passed to C as-is since it points to the actual data part. However, if you look at the word before what the pointer points to, it is a valid Julia object too, so you can program with it in Julia and then just pass it to C without having to change anything. This provides maximum compatibility with C code. You could even write a linked list struct type in Julia and then just pass it to C code and have the C code traverse the linked list without any problems. Having C code pass structures back to Julia is more of a problem, however. Maybe it's a one-way street.

Structs are laid out exactly the way C lays them out; they can also be stored in concrete arrays without any tags, giving C compactness and efficiency. If you do Array(Foo,n) then it is initially just filled with junk like Array(Int32,n) would be. Structs are just plain old data. They can, however, have constructors.

Some issues we haven't fully thought through are fixed-size array fields and struct unions. But we can worry about those later.

JeffBezanson commented 12 years ago

Good summary. Does this mean we can close #13?

StefanKarpinski commented 12 years ago

This is basically the same issue, we just decided that full immutability is not necessary. I've actually been having some second thoughts about what we talked about. The main problem being the unidirectionality issue: the Ref hack lets us pass structures from Julia to C but not the other way around. I want to be able to get any value from a C function and by knowing its C type, be able to manipulate it in Julia. Essentially what we really want is to make C data structures visible and manipulable in Julia code. Adding another kind of Julia type feels wrong too. Also, C does fine without boxing its data structures. If every field is strictly typed don't only the roots need to be boxed?

StefanKarpinski commented 12 years ago

Also, I was thinking that maybe this should only be used for C data structures. Because really, we want to perfectly emulate C's struct behavior. Supporting all that in Julia syntax just seems silly. Although C doesn't have parametric types which would be limiting for things like Complex and Rational.

JeffBezanson commented 12 years ago

I don't get the point about boxing... If you have a linked list in C, you need to allocate the nodes somewhere. And C doesn't have type tags because it doesn't dynamically dispatch on the types of structs... I would say that's a fairly major difference.

I think the key here is not to get too fancy, and just get what functionality we can. Initially let's just have structs with bits type fields. That gets us a lot of C APIs, plus efficiently-stored complex, rational etc. arrays for any element type. And it also gets us C99 complex numbers on x86/64. That's pretty good. You can deal with linked structs using Ptr fields. We could add an unsafe deref operation that takes a Ptr to a struct and gives you something you can manipulate fully in julia. You can also make such structs by calling malloc, or by storing pointers to julia objects and just making sure those objects won't get collected while C code might be using them. The point is, at least everything you might want to do would be possible.

StefanKarpinski commented 12 years ago

I don't get the point about boxing... If you have a linked list in C, you need to allocate the nodes somewhere. And C doesn't have type tags because it doesn't dynamically dispatch on the types of structs... I would say that's a fairly major difference.

My point is that the way we're talking about accessing C structs in Julia, we're not really doing anything dynamic either. If you know that x points to a Foo struct and you know that Foo struct has bar::Int32 and baz::Float64 then none of that really needs to be boxed to inspect it. Yes, x needs to contain a box, but the struct itself doesn't really need to be boxed, nor do any of its contents — since you know completely statically what their types are from the declaration of the struct in Julia (or learning it somehow, parsing header files or whatever). So maybe x is actually a StructRef object that just has a pointer to a bare struct with a known type. From there, everything that the network of struct objects can reach has completely known types and therefore doesn't need any boxes. If we stick to the completely reasonable and de facto standard convention that a NULL pointer doen't go anywhere and any other pointer does, then we can unravel an entire network of C objects just from that one root reference. We just want a way to be able to look into C structures in Julia.

I think the key here is not to get too fancy, and just get what functionality we can. Initially let's just have structs with bits type fields. That gets us a lot of C APIs, plus efficiently-stored complex, rational etc. arrays for any element type. And it also gets us C99 complex numbers on x86/64. That's pretty good. You can deal with linked structs using Ptr fields. We could add an unsafe deref operation that takes a Ptr to a struct and gives you something you can manipulate fully in julia. You can also make such structs by calling malloc, or by storing pointers to julia objects and just making sure those objects won't get collected while C code might be using them. The point is, at least everything you might want to do would be possible.

I think what I'm talking about is less fancy than the hack we were talking about before with Ref objects pointing to the data part of boxed objects. Basically, I'm just talking about making the fields of C structures inspectable from Julia. The initial part of that, as you say, would be just exposing bits type fields. Can this be done just by having a CStruct object that points to both a description for the struct and the actual struct data? The getfield method would have to get, copy and box bits type values. The setfield method would do the opposite. When you get a field that is a pointer to another struct, you need to create new CStruct object with the correct type and its pointer pointing to the raw struct data. I think that handles the whole problem. The only real issue is making setfield and getfield behave the right way for CStruct objects. We might also want to automatically create custom CStruct subtypes so that the accessor methods can get specialized on the type and so that type inference can reason about them. Not sure.

JeffBezanson commented 12 years ago

It seems a bit wonky to me to first make structs not types, then hack in extra features so type inference can reason about them.

As I said in my email last Friday, yes, you could implement CStruct in julia just as you describe. We might need to add some unsafe load/store intrinsics. But it seems unfortunate to me not to get leverage out of that, like getting efficient struct arrays. Also, to pass structs as values and not just pointers to/from C we need to tell LLVM the layout of the struct, which would be quite awkward if the type system doesn't know about it.

StefanKarpinski commented 12 years ago

The entire reason I'm thinking about this is just to address one problem: how can we get structured data from C and then access it transparently in Julia. That's one issue. Compact arrays (potentially with immutable semantics) are another issue. I agree that having a single solution to both would be ideal, but what we talked about before won't let us get general data structures from C, it just lets us pass data to C, so I don't see it as a good enough solution. If we can't solve both problems well with one solution, it seems better to me to solve them both well separately.

I'm essentially proposing that structs aren't types but are just plain old data (with fields). The idea is that structs are like unboxed data. When you get something back from a C function, ccall boxes the returned value — by making a box with a pointer to the actual value. Likewise, when you access a field value the getfield function boxes the result — but we can't box data given to us by C in-place because it might be using that memory. So one needs an indirect box: a box with a pointer to the real data. I just can't see any other possible way of doing bidirectional interaction with unboxed C data in a language that expects boxed objects.

But, on the upside, this might actually also be a way to also get efficient struct arrays. Because you can just deal with them the same way that you deal with C structs: with an indirect box. That way you can have an indirect box reference to a value whose actual storage is in an efficiently packed struct array. Since setfield would modify the array storage (as it would for C structs too), you would even get normal object semantics. But here it gets a bit nasty because you would either have both directly and indirectly boxed objects or all types that could be stored efficiently like that would always be indirectly boxed.

JeffBezanson commented 12 years ago

But what does it mean to solve both separately? Are we going to have two implementations of struct layout? Are we going to have some kind of value types in julia that for some reason cannot be passed to C functions expecting similar structs?

If the type system does not know about struct layout, how does ccall/llvm know?

I agree that we don't need the Ref hack we talked about. But you seem to be describing a different implementation of the same sort of thing, so I don't see why it necessitates a separate feature just for C interaction.

In fact this is very similar to something I did in femtolisp; I have a cvalue type that contains a description of a C type and a pointer to its data.

In your solution, how do we pass data structures to C? Is it possible to do

struct Foo
  x::Int32
  next::Ptr{Foo}
end

n = Foo()
n.x = 0
n.next = Foo()

There are GC issues. There are also GC issues with keeping the storage for an object inside a separate Array.

StefanKarpinski commented 12 years ago

With this approach it seems really weird and arbitrary that the struct keyword introduces data types that are just like our other composite data types but they happen to be passable to C. It just seems so arbitrary and annoying from the user's perspective.

JeffBezanson commented 12 years ago

Hard to argue with that; having fewer features is a very high priority.

What if we just use our existing composite types, but say that they have C struct layout "when possible"? (i.e. when all the fields are plain-old-data types)

At least that handles passing and returning individual structs. And we could maybe overload getfield/setfield to allow accessing fields through a Ptr{SomeStruct}. We could ignore arrays at first, and maybe deal with them later. After all, we already need to do something else to handle fixed-size arrays in C structs.

StefanKarpinski commented 12 years ago

I just had a slightly zany idea regarding efficient object storage in arrays. What if this was a property of the array rather than of object types? Here's my thinking: we already have two kinds of arrays — cell arrays and vectors. They are subtly different and actually the subtlety of that difference may be a little confusing. So maybe this another, more significant and useful difference:

type Foo; bar::Int64; end

julia> a = {Foo(1),Foo(2),Foo(3)}
{Foo(1), Foo(2), Foo(3)}

julia> x = a[1]
Foo(1)

julia> is(x,a[1])
true

julia> x.bar = -1
-1

julia> x
Foo(-1)

julia> a
{Foo(-1), Foo(2), Foo(3)}

julia> v = [Foo(1),Foo(2),Foo(3)]
[Foo(1), Foo(2), Foo(3)]

julia> y = v[1]
Foo(1)

julia> is(y,v[1])
false

julia> y.bar = -1
-1

julia> y
Foo(-1)

julia> v
[Foo(1), Foo(2), Foo(3)]

So for vectors, structures are stored efficiently in-line and when you assign a value from them you get a boxed copy of the value that's in the array. Cell arrays, on the other hand, actually hold pointers to boxed objects so if you assign from a cell array, you're just sharing the same object. In a strange way, I actually think having a semantic difference like this between cell arrays and vectors may make it easier to explain the difference. It's also way easier to explain that there are two different types of arrays that can hold objects with different semantics that are both of clear use in different situation than it is to explain why we have two nearly identical kinds of types.

This way immutability of types becomes a completely orthogonal feature that only has to do with the nature of the data type itself, and not with the incidental fact that you want to store lots of them without the overhead of heap allocating each one individually with its own box. Rational and Complex both make sense as immutable types, but not because we want to store then in arrays efficiently — it's because they're numbers and they are thus conceptually identified with their value. You might just as much want to store a naturally mutable record type in an efficiently packed array too.

This arrangement could also seriously alleviate functional programming issues a la map. This isn't the only way to go, but with this scheme it kind of makes sense for vectors to always have a single, concrete storage type and for cell arrays to be able to hold anything. Then if you apply map to a cell array, it's completely obvious what the result type should be: another cell array. If you apply map to a vector, it has to result in a single concrete storage type. In other words, the possible result types don't promote to a single common storage type, then the operation is actually broken.

Also, this is dirt simple to explain to C programmers: a vector is an array of values and a cell array is an array of pointers to values.

JeffBezanson commented 12 years ago

I like this idea, it seems to be the most promising so far.

But I should point out that currently there really aren't any differences between cell arrays and other arrays. If you don't look at the element type, and you're not calling C code, you can't tell the difference.

In this scheme, "cell array" would be defined as "array whose element type is abstract". Which is pretty reasonable.

Of course there is one significant flaw: this makes it impossible to provide the most specific element type without also changing how storage works. i.e. I can't make my array an Array{Foo,1} just for purposes of telling the compiler that all elements are Foo; it will also change the reference semantics. We would need some other way to tell the Array what to do.

StefanKarpinski commented 12 years ago

Well, we could allow cell arrays with restricted type. But we wouldn't need to provide a literal syntax for that since an unrestricted cell array can do everything that can do, just with potentially slightly less good type information. In this scheme a vector with abstract type would not be allowable, however, since that would actually be a type of cell array (non-immediate storage). I would kind of want to evaluate just how much useful type information we get from restricted cell arrays, which currently is just arrays whose element type is abstract. Making all arrays either able to contain any kind of value because they're actually arrays of pointers to boxed elements or vectors that store data immediately seems like just the kind of language simplification we're looking for.

This whole scheme would also make it a little more reasonable to just say that [ ] creates an empty Float64 vector — since 95+% of the time, that's what you want, or at least it will work.

StefanKarpinski commented 12 years ago

I kind of have a feeling that this would work really well in practice. I was thinking about applying a map operation to a bunch of numbers producing strings (think map(hex,[1:100])). It's worrisome that the strings might be of different concrete types, but I think that in practice what would result is actually that they would all be of type ASCIIString and therefore actually be boxes of the same size pointing to heap allocated string data. In this case it's certainly true, but I think that just in general it would usually be true. And as an escape valve, you could always explicitly map to a cell array.

JeffBezanson commented 12 years ago

You keep saying "cell array" and "vector" like they are different types, but we only have the single type Array. How do you program "cell array of type Foo" (restricted-type cell array) vs. "vector of type Foo"?

StefanKarpinski commented 12 years ago

At the moment we do, but I'm trying to imagine a different way of doing things and provide some terminology for it. The answer is that a cell array of type Foo is an array of pointers to objects that are all subtypes of Foo whereas a vector of type Foo is a uniform array of objects of concrete type Foo. If Foo is abstract, the latter doesn't make sense, and if Foo is concrete, the former could be optimized to stored like the latter if it's immutable like bits types or is an immutable composite type (again, something we don't yet have but could).

StefanKarpinski commented 12 years ago

Any more thoughts on this? Should I draft a proposal email and send it to the list for feedback?

6e441f9c commented 12 years ago

By the way, any efficient value-semantics record type will allow to (try to) implement "dimensional calculations": it will be possible to define time type and length type such that they are floats in all calculations, but length divided by time will have a distinct speed type. Speed will be impossible to add to length because of type error.

ViralBShah commented 11 years ago

Pull request #1831 - ccall with julia types as structs

alexcodoreanu commented 9 years ago

Hi guys,

I'm a new Julia user and I'm implementing Julia into my research, however this Structure vs. Type issue is quite important. Often times I have to dynamically create a return_structure in an automated way based on different input parameters and currently there is no clear way to implement this. I can create a Type quite easily

Type temp_foo bar baz end

but I can't clear temp_foo so now I have to create temp_foo1

Type temp_foo1 bin bar baz end

if I want a new definition which includes bin. A temporary structure which gets allotted temporary memory within the use of a function only, ie. workspace, is a critical addition for end users. It's a bit outside of my Julia scope to attack this issue and I hope that the solution is not already implemented and outside of my google search skills.

mbauman commented 9 years ago

Your question seems to be better suited as a julia-users listserv post. This thread is very old; since then tuples have been greatly optimized to fill this sort of role. If you need to return multiple values, try returning them in a tuple. It's unnamed, but you can directly assign them via destructuring:

bar, baz = f1()
bin, bar, baz = f2()
jiahao commented 9 years ago

@alexcodoreanu Kindly do not post multiple small variations of the same question in different venues. It is unnecessarily confusing for everyone to have to follow 3 copies of the same conversation.

https://groups.google.com/forum/#!topic/julia-users/bwF9bpr8_7w

https://groups.google.com/d/topic/julia-users/xNntTEtVyoU/discussion

julia-users is a more appropriate forum for this kind of question.

alexcodoreanu commented 9 years ago

Hi Jiahao,

Apologies on that, I thought that I had erased the first post, I had a TAB Enter issue but it still posted. The last thread was simply addressed as a user comment to Julia developers.

Good to see that the threads are so well monitored and curated!

All the best,

Alex Codoreanu Ph.D. Candidate Swinburne University Center for Astrophysics and Supercomputing http://astronomy.swin.edu.au/staff/acordoreanu.html

On Tue, Feb 3, 2015 at 3:54 AM, Jiahao Chen notifications@github.com wrote:

@alexcodoreanu https://github.com/alexcodoreanu Kindly do not multiple post small variations of the same question. It is unnecessarily confusing for everyone to have to follow 3 copies of the same conversation.

https://groups.google.com/forum/#!topic/julia-users/bwF9bpr8_7w

https://groups.google.com/d/topic/julia-users/xNntTEtVyoU/discussion

julia-users is a more appropriate forum for this kind of question.

— Reply to this email directly or view it on GitHub https://github.com/JuliaLang/julia/issues/259#issuecomment-72492187.