JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.06k stars 5.43k forks source link

abstract types with fields #4935

Open JeffBezanson opened 10 years ago

JeffBezanson commented 10 years ago

This would look something like

abstract Foo with
    x::Int
    y::String
end

which will cause every subtype of Foo to begin with those fields.

Some parts of the language internals already anticipate this; it's a matter of hooking up the syntax and filling in a few missing pieces.

JeffBezanson commented 10 years ago

One could go even further, and point out that overloading . reduces the need for this feature even more, since I can effectively add read-only fields using

getfield(A::MyAbstractType, ::Field{:x}) = 0

So fields in abstract types really only add something in the rare case where being able to store some piece of state in a value is part of the interface.

tknopp commented 10 years ago

@JeffBezanson I totally get that you need to finalize types in order to have unboxed array content. But with this PR on the table it seems to be that this will allow subtyping "with tricks" and I wonder if it will get a common pattern to define all methods on the "almost concrete" type and then put a trivial concretization on top of that.

JeffBezanson commented 10 years ago

In several cases that already happens (e.g. AbstractDataFrame), and is fairly common in OO languages generally. If people want to think of concrete types as just a final declaration, that's fine with me. We're happy as long as "final" types are possible, and that people are encouraged to use them.

tknopp commented 10 years ago

I am also not totally sure if the lack of inheritance of concrete types is really an issue. The nice thing would be that one inherits all methods of a parent type. But maybe the cleaner solution is to use a "has a" relation instead of a "is a" relation anyway. But this currently means redefining various methods which feels like a drawback compared to inheritance.

StefanKarpinski commented 10 years ago

I am starting to wonder if this feature is really that necessary. It's kind of hard to see what crucial problem it's solving. The main benefits seem to be:

  1. Guarantee that obj.field will always work for all subtypes of an abstract type that has .field. Otherwise someone could define a subtype and forget to have this field, leading to errors.
  2. Guarantee that obj.field is stored at a consistent location in all subtypes.

I actually think that 1. might be an argument against this feature: if we allowed overloading of obj.field then a subtype could define a getter and/or setter for .field instead of actually storing the field and still work fine with the abstract behavior of the super-type. I'm not sure if 2. is actually enough of a benefit to warrant the entire feature.

StefanKarpinski commented 10 years ago

In fact, you can imagine a subtype being forced to have a .foo field but wanting to overriding the .foo syntax. Then it would be forced to have a vestigial .foo field that's just wasted and forced on it for no good reason.

johnmyleswhite commented 10 years ago

Following up Point 1, part of the appeal of abstract types with fields is that they define a kind of interface you know all subtypes will support. For example, linear regression, logistic regression, SVM's and other models all will have a specific weight vector that you'd like to be sure is available. Whether it's through a field or a function is much less important than checking that the implementation satisfies the stated protocol.

StefanKarpinski commented 10 years ago

To that end, I think that having a formalization of protocols/interfaces is much more useful and important than fields for abstract types, which only addresses a tiny portion of this much bigger issue.

johnmyleswhite commented 10 years ago

I agree that it's much more important. But there is something nice about the minimal typing required when you have a lot of concrete types that are small variants on a parent abstract type that has almost all of the fields that concrete types will need.

quinnj commented 10 years ago

I would say I'm on the "probably don't need this" side. If it's really just saving some typing, I don't think it's worth it. I think not having this in addition to the great feature of no sub-typing concrete types makes for very legible code. I think it's been mentioned many times that one of Julia's strengths is the code readability and having a type's fields not be explicit seems like a bummer to me. Any time I see a type that sub-types, I then have to track down that type's parent, moving up the chain until I finally parse all the fields this type happens to inherit. That or use names(), which also feels a little clunky.

StefanKarpinski commented 10 years ago

Right, John, I see your point about minimal typing but my stance on this was to still require specifying the fields, which would completely undermine that benefit. I'm still not convinced of any design here that actually would reduce typing at all.

andrewcooke commented 10 years ago

ok, so if we don't need this how i would solve the problem i had that originally led me to this page?

i am writing an api - say, for genetic algorithms - and i have a data structure (a type) that is passed to several functions. this structure will contain information that the "general" genetic algorithm uses (a population, parameters describing how to breed, etc) and also some customizable information that the api user should add.

so, when i write the library, i know some "part" of this data structure, but not others. other parts will be extended by the library user. for example, they might need to store some parameters that are needed to generate new individuals.

at the function level, i can structure this just fine. i write my "general" functions that do the tasks of breeding, etc. the user writes functions (with a name i choose) that i call from my code when i want to create new instances (for example). the user function is passed this same data structure.

but at the type level, i don't understand how i can do this without what was described here. i need to pre-define some fields, which are used by the library code. the user needs to extend that with other fields after the library is written.

if we don't have fields on abstract types, how do we solve the above elegantly?

[what i ended up doing was adding a single field, which the user can extend. so the user then defines their own structure and stores it there. that works, but it seems clunky to me - for example, it makes extension by two parties at once difficult, since they must agree between themselves what this single extra thing is. but maybe that's the best that is possible in julia. i just wanted to set out a clear example in case people are missing a use case...]

[i am also worried about "Whether it's through a field or a function is much less important..." since functions are not context-dependent in the same way as data (and closures cannot be assigned to package functions, as far as i can see). in other words, if you're calling the library twice, how do you make functions specific to a particular call?]

finally, i think a more formal way of saying the above is that this is the "expression problem" (wadler et al). although i haven't looked at that in some time and may be wrong.

tknopp commented 10 years ago

I think the point is that one has to repeat the getter/setter methods for all(!) child types of an abstract type. Using this proposal one only need the abstract type definition with fields and is done.

To @StefanKarpinski concern with the "lost field" when overriding. I think in most situations the overriden field is still in use. I use in C# properties a lot in the following way:

double myProperty;
double MyProperty
{
  get { return myProperty; }
  set 
  {
    // perform a range check to look if value is in a valid range (e.g. > 0)
    myProperty = value;
    // update some dependent properties
  }
}

In the combination with the GUI toolkit WPF these properties can be bound to GUI elements, which makes it very convenient in practice. Without this, the relevance of this feature might not be so high.

timholy commented 10 years ago

I think the point is that one has to repeat the getter/setter methods for all(!) child types of an abstract type.

The most specific version of a function gets called for each set of arguments. By defining a method for the abstract type, it will be used for all child types, unless there's an even more specific definition.

jiahao commented 10 years ago

As discussed earlier today with @JeffBezanson:

A concrete example for which this could be useful would be to simplify defining methods for Diagonal, Bidiagonal, SymTridiagonal, Tridiagonal, and a future hypothetical Banded(N) matrix types. Each of these matrix types would require a field for the diagonal elements. Most of these would need also at least one sub/superdiagonal field, and quite possibly more.

A hypothetical supertype of these particular matrix types would simplify the implementation of basic linear algebra functions. For example, diag(A,n) should retrieve the appropriate super/sub/diagonal field, or otherwise generate a zero vector of the correct length.

JeffBezanson commented 10 years ago

I'll mention that a definition like this is valid, since we don't strictly check things:

diag(x::AbstractDiagonal) = x.d

Then each subtype just has to have a field of that name.

lindahua commented 10 years ago

I looked through this entire thread. I am still not convinced that abstract type with fields is necessary. I have worked on a dozen of Julia packages, and doesn't come across a single time where I want a super-type to enforce that all subtypes have to share some common fields.

There are plenty of cases that one would want to enforce that all subtypes can provide information of some sort. However, all these can be done through (multi-dispatch) methods instead of fields. Requiring methods to be implemented is far more flexible than requiring the presence of a particular set of fields.

The following example should illustrate this point. A common information that should be provided by all kinds of matrices is the number of rows & columns. Then, should we do the following?

# This forces all subtypes of AbstractMat to have fields nrows & ncols
abstract AbstractMat
    nrows::Int   
    ncols::Int 
end

Of course not. There are numerous ways to represent the shape, and using two integers is just one of them. For example, I can use a tuple or a vector, etc, or if I want to implement a SquareMatrix type, I can just use one integer to represent the shape. I think the current Julian way does this right -- it requires the size(a, d) method to be implemented instead of requiring what fields should be present.

To me, fields are almost always about implementation details. Interface should be expressed using methods. Abstract types with fields are kind of making the fields part of the programming interface (API). I am yet to be convinced that this is a good idea.

Using fields in abstract types also make things unnecessarily complicated. What if the subtype want the fields to be of different types than those being declared in the abstract type?

lindahua commented 10 years ago

People mentioned the usefulness to allow properties to be inherited. I agree with this.

However, properties are more like methods than fields.

lindahua commented 10 years ago

In terms of saving typing, one can always use macros. If you find your self writing a lot of types that share a subset of fields, you can write a macro to generate those shared parts so that you don't have to repeat them many times.

JeffBezanson commented 10 years ago

I now agree this is a kind of marginal feature. It's easy to misuse; as you point out it's undesirable for read-only properties.

tknopp commented 10 years ago

@timholy I wanted to come up with an example like that one that @JeffBezanson provided, which I thought would not be possible. Is that pattern used anywhere in the Julia source code? In combination with field overloads this could be very interesting. One could provide default implementations on abstract types that make certain assumptions on the fields available. A concrete type either has provide the field or provide an equivalent field overload.

I kind of agree with @lindahua that currenty methods are used as public interfaces while fields are implementation details. Making fields overloadable can break this view. Then the fields/properties can become part of the interface. In C# usually the convention is used that properties start with an upper case letter to make clear that this is part of the interface.

I am actually not totally sure if we need properties in Julia. The nice thing about them is a) the point syntax b) that getter and setter have the same name. b) might not be that important in Julia as we have this nice ! notation. So one could define properties as

wheel( car ) # gets the wheel wheel!( car, anotherWheel ) #sets the wheel in car

JeffBezanson commented 10 years ago

The dot syntax seems to be a really huge deal for many people. It is arguably one of the most popular bits of syntax among all modern languages. Modern languages need to support dot-oriented programming :)

tknopp commented 10 years ago

Well, from my point of view it is a plus that Julia does not support the dot syntax for member functions. But fields and properties are a different thing these are things that definately belong to an object. But on the other hand it would be kind of consequent to not allow field overloads and do all getters/setters with methods like I outlined above. Then one has a cleaner separation between what is an interface and what is the implementation detail

JeffBezanson commented 10 years ago

I agree that many properties should be methods, like size(x). I would like to add dot overloading, but I don't want to see a profusion of things like x.size as a replacement for these.

cdsousa commented 10 years ago

If something like

getfield(A::MyAbstractType, ::Field{:x}) = 0

can be done, then nothing stops one to start doing at the beginning of the code

getfield{S}(o::Any, ::Field{S}) = @eval $S($o)

and do

a = [1 2; 3 4]
a.size

everywhere.

( How I tested it:

abstract Field{S}
getfield{S}(o::Any, ::Type{Field{S}}) = @eval $S($o)
a = [1 2; 3 4]
getfield(a, Field{:size})

( When I first arrive at Julia, I tried to do something like that (because I though Julia dot syntax was broken :D ))

tknopp commented 10 years ago

That is the danger. The C# developer in me wants it but it will break the view that fields are implementation details. Maybe it needs a more well defined use case. I think @stevengj wanted this for pycall.

JeffBezanson commented 10 years ago

That hack fills me with dread. Not to mention that getfield(::Any, ::Field) = 0 would probably just break the whole system.

timholy commented 10 years ago

@timholy I wanted to come up with an example like that one that @JeffBezanson provided, which I thought would not be possible. Is that pattern used anywhere in the Julia source code? In combination with field overloads this could be very interesting. One could provide default implementations on abstract types that make certain assumptions on the fields available. A concrete type either has provide the field or provide an equivalent field overload.

If you give it a try, you'll see it works. (Images uses this technique.) There is no compile-time guarantee that the fields are there, but if they are not you'll get a clear run-time error, and to me that seems adequate.

aviks commented 10 years ago

I think @stevengj wanted this for pycall.

Yes, and I want it for the same reason in JavaCall.

kmsquire commented 10 years ago

I had a few different implementations of an OrderedDict which were only slight modification of Base.Dict. One version (#2548) of this added an AbstractDict class above Dict and OrderedDict, which basically assumed that most of the current fields of Dict existed, and added two or three more for OrderedDicts. Jeff didn't like that at the time (see his comment in #2548), although without allowing fields in abstract types, that would probably be the most efficient way forward.

ssfrr commented 10 years ago

I have another datapoint for a use case where this would be very handy. I think that the general concept is when the operations defined on the abstract type require some state to be attached to the object (which is what @JeffBezanson mentioned above)

In AudioIO.jl the audio processing is implemented by creating a graph of AudioNode subtypes that each implement their own render function to generate audio (e.g. SinOsc <: AudioNode renders a sinusoid, AudioMixer <: AudioNode calls the render function on all of its inputs and mixes them together). I wanted to enable waiting on AudioNodes, so I implemented Base.wait(node::AudioNode), which waits on a Condition stored with the object. In order to do this I had to track down all the concrete subtypes and add the condition field to all of them. That's manageable if they're all implemented in this module, but as the number of AudioNode types grows and possibly becomes split across different libraries it's infeasible to have to go in and add a field to all of them.

Allowing fields on abstract types seems like a win in this case, but there are alternatives:

  1. Do what I'm doing now which is to manually define all required fields in each subtype. This is error-prone, it's easy to forget one, and it especially problematic across libraries
  2. Create a AudioNodeState type, and all subtypes are required to have a field node_state::AudioNodeState. That way if I add behavior to AudioNode that requires some state I can add it to the AudioNodeState definition. This actually seems like a pretty good solution that's conceptually simple and explicit. There's only one thing for subtype implementers to remember, and if they forget to add the field it will get found out the first time any state is accessed.
  3. Add a macro that defines the proper fields. This feels more magical than #2 and doesn't seem to gain much.
  4. Make AudioNode a concrete parametric type with the specific renderer contained as a field within, like
type AudioNode{T <: AudioRenderer}
    cond::Condition
    active::Bool
    renderer::T
end

This feels a little heavy/complicated, but probably worth trying on for size.

Given that we have a couple of seemingly pretty-good options, I'm actually less convinced than I was before that fields on abstract types are the right fix for this problem. It seems like having fewer patterns to choose from is a good thing, and I definitely agree with @karbarcca that the locality and explicitness of Julia type declarations makes the code a lot easier to read.

Thanks @StefanKarpinski and @JeffBezanson for the discussion and ideas today, which helped to crystalize a lot of this.

cdsousa commented 10 years ago

I think the 2nd and the 4th alternatives are the most julian ones. And for this particular case, the 4th option seems to be the most meaningful.

vtjnash commented 10 years ago

Somehow I hadn't seen this thread.

The dot syntax seems to be a really huge deal for many people. It is arguably one of the most popular bits of syntax among all modern languages. Modern languages need to support dot-oriented programming :)

I somewhat feel like most of the requests for it are from people writing inter-op code for these other languages. (although this is off-topic for this thread)

I now agree this is a kind of marginal feature. It's easy to misuse; as you point out it's undesirable for read-only properties.

Let's close this issue then. I've never felt it would be a significant savings in any of my code. And it further confuses the difference between the "thing" -- a type -- and the behavior -- the abstract. If anything, I would propose trying to make those more distinct (but that's a different topic for later).

elextr commented 10 years ago

Also late to the party :)

One of the common use-cases put forward for fields on abstract types is to provide some base data and implementation that user code extends by derivation.

But instead of derivation, isn't the Julian way to do this to make a generic type with the part the user adds being a type parameter.

Then you separate the basic functionality and the extension parts cleanly, you properly express that the basic functionality can't be used without the extended functionality, you create an appropriate concrete type when its extended, and you even save typing :)

vtjnash commented 10 years ago

instead of abstract types with fields, what if this were reversed: concrete types with inheritance? same underlying machinery, but the user can choose whether to inherit the fields or replace them (extending the fields is not allowed. abstract immutable is not allowed). this avoids the two pitfalls of: forcing the user to have fields they don't actually need and constructor dependencies.

abstract type A
  field1
  field2
end

type B <: A
  field1
  field3
  field4
end

type C <: =A

A(1,2)
B(1,3,4)
C(1,2)

inner constructors would be inherited also

i think this would make wrapping Gtk.jl much nicer and user-friendly, since Gtk has many of these inheritable concrete types. On the Julia side, most of these simply have a handle::Ptr{GObject} field in julia (and an identical constructor), but a few of which have something else.

milktrader commented 10 years ago

An example of how this will be useful is in creating an abstract AbstractTimeSeries.

abstract type AbstractTimeSeries{T,N}
  timestamp
  values::Array{T,N}
  colnames
  # inner constructor enforcing invariants
end

This makes creating custom time series types much simpler.

immutable FinancialTimeSeries{T<:Float64,N} <: AbstractTimeSeries
  # 3 fields plus inner constructor for free
  instrument::Stock
end

type OrderBook{T<:ASCIIString,2} <: AbstractTimeSeries
  # 3 fields plus inner constructor for free
  instrument::Stock
end

type Blotter{T<:Float64,2} <: AbstractTimeSeries
   # 3 fields plus inner constructor for free
  instrument::Stock
end

type FinancialPortfolio{T<:Float64,2} <: AbstractTimeSeries
 # 3 fields plus inner constructor for free
   blotters::Vector{Blotter}
end

type FinancialAccount{T<:Float64,2} <: AbstractTimeSeries
 # 3 fields plus inner constructor for free
   portfolios::Vector{FinancialPortfolio}
end

Though I likely mucked up the syntax, the basic idea is that new custom time series types are easy to construct, and new fields can be added.

toivoh commented 10 years ago

Would we allow new inner constructors in the derived type? Would new in the derived type invoke an inner constructor of the base type?

milktrader commented 10 years ago

Yes, I would think it useful to add invariants, but every derived type would at least have the abstract invariants, which includes the length of the time array matches size(values,1) and colnames matches size(values,2), as well as dates must be sequential and in descending order.

vtjnash commented 10 years ago

just to be clear, my proposal explicitly disallows partial inheritance of fields. it is strictly limited to selectively allowing concrete types to be used as abstract type names. (with special syntax to indicate that the derived class has exactly the same fields and constructors as the original type)

toivoh commented 10 years ago

I think that we have considered abstract types with fields in the context of trying to solve a number of different problems, but looking at this discussion it seems to me that if they should be used for anything, it should be to address the cases where you would currently create either

(alternatives 4 and 2 above from @ssfrr respectively), because you want to create a family of types with some common storage and behavior.

toivoh commented 10 years ago

Once upon a time, there was an idea to disallow field access such as obj.x in all cases except when the concrete type of obj was statically known. If you didn't know the type of obj when you wrote the code, how could you know what its fields signify?

The way I understand it, this restriction was not implemented because it was deemed too useful to be able to have a family of types containing some same named fields. But this seems to be exactly what abstract types with fields would address!

How about restricting field access like obj.x to cases where it is statically known that obj.x exists? The statically known type of obj need not be concrete, as long as the fields in question would exist in it.

To ensure the separation between abstract type and subtypes, access to the fields of the abstract type could require that that type be used, not just a subtype of it. This would give a complete namespace separation between the fields of the supertype and subtype. It could go a long way to avoiding the fragile base class problem, by forcing a proper interface between supertype and subtype.

Of course, it remains to be defined what the statically known type of an expression would mean.

dpo commented 10 years ago

+1000

StefanKarpinski commented 10 years ago

Once we make a.b syntax overloadable then it will be just like writing f(a,Field{:b}). We never statically disallow generic function application, but this would be the only place in the language where we do that? That doesn't really make any sense.

toivoh commented 10 years ago

I will readily admit that I'm not at all convinced that this is the way forward. What troubles me most is introducing and defining an entirely new mechanism in the language. Still, I find the idea interesting enough to investigate it a bit further:

I agree that we shouldn't statically disallow generic function application. Another way would be that, instead of having a.b as syntactic sugar for getfield(a, Field(:b)), to make it stand for

getfield(a, Field(:b), static_type_of_a)

where the last argument gives the static type that the field is looked up in. Actual fields would correspond to signatures like

getfield(a::A, ::Field(:b), ::Type{A}) = ...  # implicitly created for field b in type A

so that they could only be accessed using the same type as they were defined in.

Properties, on the other hand, could and probably would be made to apply for a range of static types, i.e.

getfield(a::A, Field(:myproperty), ::Type) = ...  
getfield{T<:MySuperType}(a::A, Field(:myproperty), ::Type{T}) = ...  

to make them available regardless of static type, or given any static type <:MySuperType respectively.

This is still different from the mechanisms that exist now. The reason to make it different would be that access to actual fields (not properties) is different - it is an implementation detail and not an interface. But I am still not sure whether it would be worth it.

StefanKarpinski commented 10 years ago

That's why this whole issue gives me pause. Inheritance in Julia is about behavior, not structure. And that's a good thing. Conflating inheritance of behavior and inheritance of structure is precisely the mistake that C++, Java, et al. have made and it causes all sorts of problems. It may make sense to introduce some mechanism for structural inheritance in Julia, but I don't think it should be confused with – or tied to – behavioral inheritance.

toivoh commented 10 years ago

Yes, I think you are right. If it shouldn't be tied to behavioral inheritance, then I guess it shouldn't be tied to subtyping at all. It would be interesting if we could come up with a way to address the family-of-similar-types problems that have been discussed here that is not tied to subtyping, but I suppose that this issue is not the forum to discuss it.

milktrader commented 10 years ago

The distinction between inherited behavior and not structure is useful in navigating how to think about this problem. I was certainly getting lost in the thread before you spelled it out that way.

tknopp commented 10 years ago

In my point of view this PR is closely relate to #1974 and #5. If we decide field overloading should not be done this PR makes a lot more sense. While I agree about behavior vs. structure thing, there are cases like Gtk.jl as Jameson mentioned where abstract types with fields are handy. But #4935, #1974 and #5 should be seen in shared context. Abstract multiple inheritance and a way to define (and check) interfaces in a formal way is IMHO the most important of these issues.

pbazant commented 10 years ago

I'd like to point out that the decision to disallow concrete type inheritance may also be supported by the fact the in the c++ world inheriting from concrete types is considered conceptually problematic, too: "Item 33: Make non-leaf classes abstract" http://ptgmedia.pearsoncmg.com/images/020163371x/items/item33.html

abeschneider commented 10 years ago

@pbazant Coming from the c++ world I disagree. The design the author chooses from that article doesn't look like a good design to me, but I don't see how it argues against inheriting from concrete types.

There are plenty of places in both the standard libraries, semi-official libraries (e.g. boost), and others which use inheritance.

In fact if you don't allow inheritance from concrete types you essentially have Java's interfaces. There are many complaints you can find for why this can make for bad design.

I understand the arguments against multiple inheritance with concrete types. That's why some languages like Ruby, Scala, and Swift have some concept of mix-ins.