chapel-lang / chapel

a Productive Parallel Programming Language
https://chapel-lang.org
Other
1.75k stars 411 forks source link

Rename `object` root class? #20414

Closed bradcray closed 11 months ago

bradcray commented 1 year ago

A discussion in https://github.com/chapel-lang/chapel/issues/20322 led to a few observations about the name of the abstract object class (which is the root class that all other classes inherit from — e.g., class C { ... } is effectively class C: object { ... }).

So, some possible actions we could take here:

:-1: Option 0: Leave things as-is

:confused: Option 1: Rename object to Object

Option 2: Rename object to something else — but what? :eyes: a) Class? Somewhat symmetrical to our use of class as an 'any class'-style generic typeclass :+1: b) ClassObject? Maybe somewhat more specific / less confusing than just using Class or Object? :tada: c) RootClass? :heart: d) BaseClass? :rocket: e) AbstractClass? f) something else? What?

[edited OP to introduce other proposals]

lydia-duncan commented 1 year ago

I'm okay with leaving as is, capitalizing, or naming it RootClass. I don't really feel super strongly, though

mppf commented 1 year ago

Another potential name is BaseClass. I am a bit worried that Class vs class would be too confusing. I feel similarly about Object vs "object" even if the latter is only used in English.

So, I lean towards something like RootClass, BaseClass, or ClassObject but I don't find any of them completely satisfying.

mppf commented 1 year ago

Also, regarding the question "Is it reasonable for us to consider records to be objects?", I tend to like to use Wikipedia as a way to know what the common or accepted definition of a term is. There I found https://en.wikipedia.org/wiki/Object_(computer_science) which, to my reading, indicates that our records could be objects, and that classes are a particular kind of object. I think that in a language like Java, Object makes sense as the root of all classes because the other types in Java wouldn't be objects (since you can't make methods on them, etc). I do not understand why the C# design was to use Object as the root of all classes, given that C# also has a distinction between records and classes.

bradcray commented 1 year ago

but I don't find any of them completely satisfying.

Given how rarely we type the identifier, that doesn't seem like a big problem to me (other than the need to make a decision).

I do not understand why the C# design was to use Object as the root of all classes, given that C# also has a distinction between records and classes.

Maybe they just stepped into the same trap we did...? Or maybe records were introduced later in C#? (I don't recall them having two object types when we started, but there's a lot I don't recall from that time period anymore).

bradcray commented 1 year ago

I edited the OP to include Michael's new suggestion and add icons for voting for various options. I'd suggest we try an approval voting process in which you mark as many as you feel positively about (where you can decide for yourself whether you feel positively enough to vote for it at all or not). I'll announce more broadly to the team soon.

bradcray commented 1 year ago

Out of curiosity, I changed object to Object on a branch (just because it was an easy change) and only got 245 failures, many of which are from common causes, so that validates our assumption that it isn't very heavily used.

mppf commented 1 year ago

I see that Object is leading in the current voting. I would be OK with it, but it does leave us with the problem that "object" in conversation has two meanings; one being Object and the other being a record/class/anything.

bradcray commented 1 year ago

I'm similarly reluctant about Object and am just using the polling to take the temperature, not to decide democratically. Michael, I was surprised that you voted for Object before you made this comment, and was going to ask about it—was that unintentional? (at this moment, if you were to remove your vote, Object would no longer be in the lead :D ).

stonea commented 1 year ago

I hate all the proposed names so far 😄.

My vote would be for Object if it weren't for the fact that we use it for records and referring to something as a "record object" bothers me for some reason.

How about Instance? Less preferable to me but just to throw it out there: Structure?

Edit: Reading closer it looks like I had that wrong: you're saying we currently refer to records as objects (in the doc) and part of the proposal is to change that. Yeah I can get behind that.

For documentation purposes, we do need some word that refers to both though. I'd be interested in what word that would be if we do decide that "object" should specifically mean: "instance of a class".

bradcray commented 1 year ago

you're saying we currently refer to records as objects (in the doc) and part of the proposal is to change that.

No, it's not (let me know what was said that implied that). The argument is that the characteristics that make something an object (according to my internal opinion and Michael's read of Wikipedia) apply to classes as well as records, so there's no current intention to change the documentation (and I'm generally opposed to not calling record instances 'objects'). The observation that kicked this issue off is that the term 'object' is not particularly specific to classes in Chapel and the fact that it's lowercase isn't particularly class-like according to our style-guide. But capitalized types do distinguish records and classes, so one could imagine changing to Object for the root class and—if records had a root type that they inherited from—it arguably could/would be object (but we don't have inheritance, so there isn't any need for that at present. And if there was, it might suggest using RootClass/RootRecord rather than something as subtle as Object/object. But I don't anticipate we'll have need of a root record object in the future, so I'm playing pretty significant devil's advocate in saying that).

I hate all the proposed names so far 😄.

Does it console you at all that you'll probably almost never type the identifier into your code?

How about Instance?

I'd consider var r: myRecordType; to create a record instance, so this seems similarly attractive or unattractive as Object to me (where Object at least has more precedence in other languages).

Less preferable to me but just to throw it out there: Structure?

Meh... Since record is more like C struct than class is, I'd be disinclined to use Structure for the root abstract class type. And like Instance, I don't think it really distinguishes between the two any better than Object does.

mppf commented 1 year ago

I'm similarly reluctant about Object and am just using the polling to take the temperature, not to decide democratically. Michael, I was surprised that you voted for Object before you made this comment, and was going to ask about it—was that unintentional? (at this moment, if you were to remove your vote, Object would no longer be in the lead :D ).

I was trying to follow your suggestion for the voting:

https://github.com/chapel-lang/chapel/issues/20414#issuecomment-1211045218

I'd suggest we try an approval voting process in which you mark as many as you feel positively about

I feel positively about Object (it solves the main problems if your are willing to distinguish "object" from Object & has precedent in other languages) but I like other two-word solutions better - ClassObject RootClass or BaseClass. I think the two-word approach is appropriate in this case because it's worth it for clarity on something relatively rarely used in Chapel code.

stonea commented 1 year ago

you're saying we currently refer to records as objects (in the doc) and part of the proposal is to change that.

No, it's not (let me know what was said that implied that)

I think the sentence: "our documentation tends to refer to records as also being a form of 'object'" and in Option 1: "in which case we should update our documentation and the way some of us talk about them"

Maybe "tends to refer to" is a bit more nuanced than saying "we currently refer to"? And maybe I should have been more explicit in saying "part of the proposal if we take Option 1" than just "part of the proposal". But I think we mean the same thing.

Also after thinking about Michael's wikipedia definition of object though now I'm leaning more towards just being more specific (especially if in practice this isn't something users are going to be typing out that often) so I changed my vote.

Does it console you at all that you'll probably almost never type the identifier into your code?

That definitely helps. If it's not too much work to try and come up with a better it's probably worth it but I wouldn't spend inordinate amounts of time trying to come up with something perfect.

In my mind I'm thinking of old-school pre-generics Java where in place of real generics you might see functions like this:

public void addToList(Object obj)

Turning that into some specific Chapel examples:

EDIT: { Updated example to use Chapel type specifiers rather than C specifiers (I'm too used to writing C++ code I guess) }

I find object, Object, and ClassObject less objectionable than any of the ones that have Class at the end of their name. Ending a typename with Class makes me think it's some sort of metatype thing used for reflection.

vasslitvinov commented 1 year ago

Noting that conversationally "base class" and "abstract class" have certain meanings and those meanings are different from what today's object is. I suggest avoiding those names.

bradcray commented 1 year ago

OK, so based on the straw poll, I'm inclined to say:

I'm also curious in the following follow-up question (vote for as many as apply on this comment):

lydia-duncan commented 1 year ago
  • You voted for both, do you have a stronger preference between the two?

I don't have a strong preference.

dlongnecke-cray commented 1 year ago

I think both records and classes are objects which is why I object to both object and Object.

I am happy with Andy's suggestion of ClassObject as well.

jeremiah-corrado commented 1 year ago

ou voted for both, do you have a stronger preference between the two?

I'm leaning towards RootClass, but I'd also be curious to know more about Andy's comment:

Ending a typename with Class makes me think it's some sort of metatype thing used for reflection.

@stonea, I'm having trouble picturing the reflection-scenario that users might get this confused with. Do you have an example of what that would look like?

stonea commented 1 year ago

I'm having trouble picturing the reflection-scenario that users might get this confused with. Do you have an example of what that would look like?

This is not how reflection in Chapel works but I could imagine something like this:

var x = myObject.class();
x.methods();    // Return a list of methods myObject has

Where the .class() method returns an object that contains information about myObject's class. What's the name for x's type then? I could imagine it would be Class.

Doing a little Google searching it looks like this is, more-or-less, how it works in Java:

https://docs.oracle.com/javase/8/docs/api/java/lang/Class.html

So why doesn't ClassObject bother me? I guess because in this situation I'm reading Class as an adjective being used to describe a particular kind of object: namely a "class object" as opposed to a "record object".

daviditen commented 1 year ago

Object or object has strong precedent from other languages. It's the base class type in Java, Python, .NET languages and I assume others. I don't know of any that use a two word name for this type.

Records don't have inheritance so don't have an equivalent thing to Object. Back when Chapel records had inheritance the base type was called value so I tend to think with that term if thinking about a record base type.

jeremiah-corrado commented 1 year ago

Where the .class() method returns an object that contains information about myObject's class. What's the name for x's type then? I could imagine it would be Class.

Ok, that makes a lot more sense, thanks!

Personally, I'm not sure that the name RootClass or BaseClass would invoke that idea in my head ("root" and "base" make me think of inheritance not meta-types or kinds). Had someone described that feature to me without telling me the type of x, I would guess "ClassSchema" or something like that.

dlongnecke-cray commented 1 year ago

After thinking about things some more I think I agree with David Iten and am adjusting my vote to be for Object. I agree that it doesn't matter that records are also "objects", as they don't have inheritance and thus needn't be concerned about what the root class for classes is called. I think this ends up being internally consistent, as we can imagine that if records did have inheritance they would use Object as the base class as well.

mppf commented 1 year ago

I think this ends up being internally consistent, as we can imagine that if records did have inheritance they would use Object as the base class as well.

I disagree, because if I want to "accept any object", the obvious thing to write is proc f(x: Object). If records are also objects I think it's confusing that this function would not accept a record argument. I think we need to pick one terminology: records are either objects, or they aren't. (I could be OK with Object and object where the upper-case one is for classes, but this seems more subtle than ideal).

dlongnecke-cray commented 1 year ago

I disagree, because if I want to "accept any object", the obvious thing to write is proc f(x: Object).

Wouldn't that just be proc f(x)? Or are we talking about any composite? A generic type that doesn't exist yet that is the union of class and record?

If Object is clearly defined and is a concrete type, I don't think there's much potential for conflation between records being objects and the class Object. I think there would be a point where a user has to go learn about Object and associate that specific type with inheritance. At which point they probably already know (or will learn) that records can't inherit.

bradcray commented 1 year ago

Wouldn't that just be proc f(x)? Or are we talking about any composite? A generic type that doesn't exist yet that is the union of class and record?

I'm not Michael, but I think you're right that that's his point. That in the same way that proc foo(x: integral) means "accept any int or uint argument", upon seeing proc foo(x: object) or proc foo(x: Object), one might expect it to accept any expression that is an object—e.g., any class or record. And so it might be confusing if it did not. Whereas names like ClassObject, or RootClass more obviously link it to classes so it's far less likely that I might mistakenly think I could pass a record to it.

I can argue the other side too, though. If you did see proc foo(x: Object) and tried to pass a record to it, it simply wouldn't work and you'd probably quickly learn why. We could even specialize the error to help you learn why more easily. So it feels like something that'd get introduced to you fairly quickly.

Still, I think Michael's point would be that it conflates a general term that we use for multiple types with a very specific type class and use case. On the other handthe capital-O more clearly suggests that this may not be the generic noun 'object' that we're referring to in the code and give pause—definitely moreso than our current little-O object does.

Jumping back to a @daviditen comment:

Object or object has strong precedent from other languages. It's the base class type in Java, Python, .NET languages and I assume others. I don't know of any that use a two word name for this type.

I think one other thing at play here, though, is that most OOP languages don't have two object types like our record vs. class. That said, Michael noted that C# does (structs and classes, and actually records too), yet still uses Object as the base class type, so there'd be precedent for doing so.

bradcray commented 1 year ago

Interestingly, Swift does not have a base/root class type at all.

bradcray commented 1 year ago

Noting that conversationally "base class" and "abstract class" have certain meanings and those meanings are different from what today's object is.

I'm not snapping to the definition of "base class" he's referring too offhand

I did some googling today because I was curious to figure out what you were referring to Vass, and quickly found definitions suggesting that these terms are used to refer to a class that inherits from no other, which is what object is for us today. Can you shed a light on what you're thinking of here? (not that these are polling very high, but just for my greater understanding).

vasslitvinov commented 1 year ago

base class : I like the definition here. It is not that a base class inherits from no other. Instead, the base class in a given situation is one that other classes inherit from. Based/derived == parent/child == superclass/subclass. A base class may or may not inherit from something else.

vasslitvinov commented 1 year ago

Why did I vote for RootClass ?

I have not had any issue with object being the root class in Chapel. I like that it is concise and all lowercase, the latter suggesting its builtin-ness. However, since we posed the question, my take is:

I like object as a term for an instance of a class or a record. If we accept this meaning, then we should not use "object" to mean the root class. Even if we capitalize it.

RootClass is a good concise description of its role, while all other voted options are objectionable to me.

dlongnecke-cray commented 1 year ago

I noticed that C++ and Swift both don't offer a "root object class". That idea really appeals to me.

Especially since our generics should offer (most/all/more than) the capabilities of C++ templates (which is ostensibly their justification for not having a root class), I don't expect it to be particularly hard for the compiler to move from defining base methods on object to generating them for all base classes instead.

Do we know how many user projects rely on object and type erasure in their code? The most obvious usage to me seems like some sort of list(object) or a collection of objects.

I might expect to find this code in Arkouda, but at the same time I could imagine them to have a sort of BaseMessage type (or whatever they use to represent messages/rpc).

If not a lot of user codes rely on this type erasure, I would love for us to entertain the idea of removing the root class entirely.

bradcray commented 1 year ago

@benharsh : w.r.t. @dlongnecke-cray's comment just above, I think you recently said something suggesting you were using collections of object in IO work, but I'm not sure whether that was fundamental or as a convenience in testing. Could you elaborate?

benharsh commented 1 year ago

It's not a fundamental thing that I'm relying on or using in module code or the compiler, it's really just one test I wrote to lock in the current behavior. I only mentioned it as a thing that we can do today (even with the old 'writeThis' approach). I'm not aware of any IO-related reasons why we couldn't get rid of object.

vasslitvinov commented 1 year ago

I do not see what we win by removing the root class as a concept, other than trivially resolving this issue. We lose:

Both of these will become non-issues once we have interfaces and the ability to define the type for "this can be an instance of any class" as "any T where isClass(T)". (Or maybe "isOwnedClass(T)" etc.)

dlongnecke-cray commented 1 year ago

Speaking for myself, I view list(object) to be a bit of an anti-pattern. Because you're not really establishing any type-constraint by saying object - you really have no clue what could be in that collection, so procedures/aggregates wrapping it have to do that checking/sanitation themselves. That seems counter-productive, and I think @vasslitvinov is right in suggesting that "interface instances" (I think Rust calls these "box" types) are a way to fix this while still being able to establish constraints for otherwise unrelated types.

Re: removing object entirely - I'm not entirely convinced that we have to, if we can just prevent users from inheriting from it directly. E.g., it might be possible to just treat it as a hidden implementation detail? Then we get avoiding code duplication for free. Though personally I am not convinced that the code duplication would be much of a problem, as we already generate so many methods anyway and the equality ones for classes are small and inlined. But I have not thought this idea through, and it's certainly possible that something on object might leak out in a way we didn't intend (which could be avoided if we remove it).

Re: Writing a function that works on any class instance - I struggle to see the value in this. If we define a virtual method on object we're essentially polluting the global namespace in a way we decided we couldn't afford to do (in discussions on Reflection functions being methods instead). And since you don't know any subtype, it's difficult to define a method that is meaningful across a broad category of objects. Essentially the only methods we have are those that work on the identity of a class (hashing it, reference equality).

bradcray commented 1 year ago

In discussions on a subteam looking at this issue's original question (organized at https://github.com/Cray/chapel-private/issues/4876), we've decided to go with RootClass as the replacement name, which also reflects the majority decision on the last poll here.

Meanwhile, @dlongnecke-cray has proposed in the meantime retiring the notion of a root class altogether, and I don't mean for this comment to interrupt that discussion. In fact, to keep it rolling, I'll ask:

David, what do you view the downsides of continuing to support a root class to be? I.e., what are we losing or failing to gain by supporting it?

lydia-duncan commented 1 year ago

I'm not a fan of removing object/RootClass right now. In particular, isn't this the only way to get collections of multiple generic instantiations of a type? E.g.

class Foo {
  type t;
  var x: t;
}

var arr: [0..3] owned object?;
arr[0] = new Foo(int);
arr[1] = new Foo(string);
arr[2] = new Foo(bool);
arr[3] = new Foo(int);
writeln(arr);

Will work today and print {x = 0} {x = } {x = false} {x = 0} while:

class Foo {
  type t;
  var x: t;
}

var arr: [0..3] owned Foo?; // Note: difference here
arr[0] = new Foo(int);
arr[1] = new Foo(string);
arr[2] = new Foo(bool);
arr[3] = new Foo(int);
writeln(arr);

will not compile because Foo is generic

lydia-duncan commented 1 year ago

This functionality is something that users frequently want to do.

vasslitvinov commented 1 year ago

While I do not see the benefits of removing the root class, @lydia-duncan - what are the cases where you have seen users rely on the root class / create collections with the root class as the element type?

lydia-duncan commented 1 year ago

It looks like Matrix does not preserve conversations more than two years old, which is where I would most expect to find examples, sorry

dlongnecke-cray commented 1 year ago

Re: Collections of generics, I'm not sure why users can't just write something like:

class Base {}

class Foo : Base {
  type T;
  var x: T;
}

var lst: list(owned Base, false);
lst.append(new Foo(int));
lst.append(new Foo(real));
writeln(lst);

David, what do you view the downsides of continuing to support a root class to be? I.e., what are we losing or failing to gain by supporting it?

I'm not sure what we gain, other than some methods staying virtual, and so ostensibly generating less code overall (how much less?!).

I feel like all use of RootClass does is encourage opportunities for bugs. Users utilizing collections of RootClass have to write wrapper classes that maintain tagging information or do dynamic casts to know what it is they're working with. Those codes do not seem very Chapeltastic.

If we lived in a world where we had "interface instances" like Vass has suggested, I feel fairly confident that we'd almost (leaving myself some wiggle room here 😄) always advocate using those over using a collection with an element type of RootClass. And that seems like a convincing argument for deprecation.

I'm fine with leaving this feature in for 2.0 and then revisiting the prospect of deprecation when interfaces come online, though. Just wanted to get the thought out into the public.

bradcray commented 1 year ago

I'm not sure what we gain, other than some methods staying virtual, and so ostensibly generating less code overall (how much less?!).

I'm not sure I buy that (or else am not understanding). If you define a class C { } that has a method foo() and no children, you shouldn't have to pay any dynamic dispatch overhead or complexity for that call since RootObject doesn't define foo(). So the cases where there would be additional overhead would just be those that are defined on RootObject and overridden in child classes. Do we have any such methods? Maybe just I/O-based ones?

I'm fine with leaving this feature in for 2.0 and then revisiting the prospect of deprecation when interfaces come online, though. Just wanted to get the thought out into the public.

That's good to hear. While I tend to agree that it's not deeply useful, my arguments for leaving it in would be:

dlongnecke-cray commented 1 year ago

I'm not sure I buy that (or else am not understanding).

If we removed RootClass, any of our methods/procedures that operate on RootClass go with it. We'd have to use generics to fill in the gaps. That's more generated code. E.g., today for operator== we just take two borrowed object.

So the cases where there would be additional overhead would just be those that are defined on RootObject and overridden in child classes. Do we have any such methods? Maybe just I/O-based ones?

Yeah, we're on the same page. I think we probably don't have very many methods, but I do think we have a few builtin functions that operate in terms of borrowed object. Those would have to take borrowed class instead and then to constrain the number of instantiations, we could have a where isBaseClass(t).

it's more work to remove than retain

This is one of those frustrating things that I can't really argue against 😄. Because we're just too far out from a competing feature coming online.

having it doesn't preclude people from using generic instantiation rather than inheritance, and we definitely lean on generics more than inheritance in our own code anyway

I don't know entirely where Chapel stands on the "many ways to do things" vs "one right way to do things" debate. However I do think that we have a stance of "whichever way you pick, it should (ideally) be pleasing and sufficiently high level".

If I want to use a list(object), the universe of types I can work with has to be known at compile-time. There's no way for a user to extend this without editing my code.

class Foo {
  type T;
  var x: T;
}

proc int.foo() { writeln('int'); }
proc real.foo() { writeln('real'); }

var lst: list(object);
lst.append(new Foo(int));
lst.append(new Foo(real));

for x in lst do
  if const c = x:Foo(int) then
    c.x.foo();
  else if const c = x:Foo(string) then
    c.x.foo();
  else halt('Error!'); // What if user wants another type to work? Too bad...

We'd probably say this is bad practice and the list should operate in terms of some sort of base class that the user can extend. I'd agree with that, and say that's kind of the point of OO in the first place. Dynamic casting is usually regarded as a smell which indicates that your base class isn't expressive enough.

This is part of why I feel like we could remove object today if we wanted to - because the author of the list(object) can already do better just by constraining to some sort of class that more accurately describes what they need the collection items to do.

However...

If we had interface instances, then I as the list user could constrain the type of the things in my list based on what I actually need them to do, without constraining the types input into thelist at all...


interface Fooable { proc foo(); }

int implements Fooable { proc foo() do writeln('int'); }
real implements Fooable { proc foo() do writeln('real'); }

var lst: list(Fooable);
lst.append(0);      // Implicitly convert to "interface instance" "Fooable"
lst.append(0.0);

for x in lst do x.foo();

Hopefully this illustrates the core idea more clearly than just throwing around the term "interface instance".

stonea commented 1 year ago

I noticed that C++ and Swift both don't offer a "root object class". That idea really appeals to me.

FWIW, Swift has an Any type, which you could use to create a heterogeneous (fairly weakly typed) list like this:

var data: [Any] = ["abc", 10];
if(something) {
    data.append(5);
} else {
    data.append("def");
}
mppf commented 1 year ago

So the cases where there would be additional overhead would just be those that are defined on RootObject and overridden in child classes. Do we have any such methods? Maybe just I/O-based ones?

IIRC, the deinit method exists for object and is overloaded by basically every type. I think it's common for calls to overriden functions not to need virtual dispatch (if it is a leaf type) but at the very least the deinit being on object means that the compiler needs to include every class in the virtual dispatch tables. That has some compile time and code size implication but I'm not sure if it's significant enough to worry about here.

To my mind, RootClass / object is kindof like void* is C. Of course it can be abused. Of course it is not the best way to write many things. But, occasionally, it really is the best way to write something, given the various factors. As a specific example, for something like an == overload implementing comparison with nil, I think it's both simpler to implement and easier/faster for the compiler to use RootClass there. Instantiating such a thing for every class type can add to our code size (IMO unnecessarily). Another case where void* comes up is basically to erase the types from an API, even though the implementation knows both the caller and the callee have a particular type. I think that can come up with RootClass as well.

I agree that for something like a list of mixed types, a base class or a runtime-bound interface would be better strategies than literally using RootClass.