ericvsmith / dataclasses

Apache License 2.0
587 stars 53 forks source link

On naming #12

Closed hynek closed 7 years ago

hynek commented 7 years ago

As I’ve already mentioned by e-mail, I’m strongly opposed to call this concept “data classes”.

Having an easy way to define many small class with attributes is nothing about data, it’s about good OO design.

Calling it “data classes” implies that they differ from…“code classes” I guess?

One of the things people love about attrs is that it’s helping them to write regular classes which they can add methods to without any subclassing or other magic. IOW: to focus on the actual code they want to write as opposed to generic boilerplate.

Debasing them by name seems like a poor start to me. We do have data containers in the stdlib (namedtuples, SimpleNamespace) so I don’t see a reason to add a third to the family – even if just by name.

ilevkivskyi commented 7 years ago

What do you think about calling them "autoclasses"? With something like:

from autoclass import auto

@auto
class C:
    x: int
    y: int = 0
    ...
hynek commented 7 years ago

Yes I love it! I was actually thinking of that myself just now based on Nick's mail. :)

ericvsmith commented 7 years ago

I really have no preference on the name. I'll let others decide whether autoclass conveys a different meaning from dataclass.

I do appreciate the "regular classes" feel of attrs.

gvanrossum commented 7 years ago

I disagree with Hynek that calling these "data classes" sends the wrong message. The things the proposed decorator adds to a class all have to do with the "data" aspect of the class: how to initialize the instance variables, how to print (repr) them, how to compare/hash them. None of this precludes that the class may also have methods. (In fact the ultimate read-only data class, NamedTuple, allows adding methods in Python 3.6.

To the contrary, I think "auto" is a terrible name, because it doesn't specify what is automatic here.

hynek commented 7 years ago

OK I can see the argument about “auto” being unclear and I really don’t insist on it, it was just the best thing I saw so far.

I cannot follow the data argument though. Yes it’s about data, but everything is data in the end, including code (let’s not have a lisp conversation tho :)). And every class should carry some data in the end, otherwise it’s rather a module…

So let’s take a step back an loop what it is about more specifically.

It’s about attributes and the dunder boilerplate involved, right? (which btw makes attrs a really good name for attrs IMHO. ;))

So even something colloquial and heavy handed like @auto_dunders that talks about what is done would be preferable to me than a @dataclass which ostensibly talks about the class which it IMHO shouldn’t.

We should always stress that the class at the end is a 100% regular class with added code that operates on the defined attributes.

Am I making any sense to y’all at all?


Just to be clear why I care so much: given the feedback on MLs and how attrs stickers have been ripped our of my hands at PyCon, I think this PEP could be a release game changer somewhere between f-strings and async/await. I can see people push for 3.7 because of this feature like they did for 3.5 and 3.6. And naming is important – in the end it’s marketing and while Python 3 may be over the hump in general, every bit helps. And I know that naming is hard – I’ve made a few sins myself.

ilevkivskyi commented 7 years ago

@gvanrossum

I disagree with Hynek that calling these "data classes" sends the wrong message.

It may be personal, but to me "data class" sounds like a different/particular kind of classes. Also some other languages have similar terminology with different meanings.

I think "auto" is a terrible name, because it doesn't specify what is automatic here.

This depends on the API. I could imagine three options:

The last API should be probably based on __init_subclass__ under the hood, since we don't want metaclass conflicts.

warsaw commented 7 years ago

What do you think about calling them "declared classes"?

ilevkivskyi commented 7 years ago

What do you think about calling them "declared classes"?

Sounds good! I am not sure what is better "declared" or "auto", but both are better than "data classes" I think.

gvanrossum commented 7 years ago

I agree that a good name is important, and I'm still open for suggestions, but I'm still not convinced that any of the alternatives proposed so far are better than "data class".

Does the proposed feature introduce a new category of classes? I think it does -- the decorator stores additional field information in a class attribute (currently __dataclass_fields__, referenced as _MARKER in the code). Sure, the new category is pretty compatible with other kinds of classes, but the same can be said for e.g. the latest version of typing.NamedTuple -- this uses a similar notation but essentially makes all fields slots, while still allowing you to define methods. (In fact, apart from the a tuple interface, NamedTuple is almost the same as dataclass with slots=True...)

The dynamic generation feature (make_class() in the current code) also smells as creating something that has just data. (Yes, you can subclass it, but the same is true for a dynamically generated NamedTuple.)

In terms of what the interface should look like (Ivan's three bullets) I think the class decorator is a hands-down winner, because it doesn't use inheritance or metaclasses (both of which have been troublesome when there's another base class). The enum-based call signature looks weird, mostly because that's not a common idiom in other parts of Python, whereas keyword flags are very common.

What do people think of "easy classes"?

ilevkivskyi commented 7 years ago

@gvanrossum

What do people think of "easy classes"?

Actually "easy" is probably even better than "auto" IMO ("auto" is quite boring plus you don't like it). This looks interesting:

from easyclass import easy

@easy(hash=True, cmp=True)
class C:
    x: int
warsaw commented 7 years ago

I'm not sure I like "easy" although I'm finding it difficult to articulate why.

Taking a trip through the thesaurus I came across "express" and I kind of like that for its double meaning, both of which I think apply here. "Express" as in you're communicating the essential bits of your class and leaving the machinery to do the rest, but also "express" as in you're taking the quick route to defining your class.

Is it too cute or obscure?

from expressclass import express

@express(hash=True, cmp=True)
class C:
    x: int
hynek commented 7 years ago

Not the biggest fan of easy either; it has a negative connotation and makes them sound like "classes light".

One adjective that Guido actually used at PyCon was "plain" and that makes more sense to me. (We talked of POPOs which means BUTTs in German so let's not go there ;))

@plain
class C:
    x: int

Speaks to me better than easy because it doesn't carry judgement and just says "this class does what you'd expect". And you literally read "plain class" in your head.

ilevkivskyi commented 7 years ago

There is one more option: don't use any special name for these classes, just call them classes. Ultimately, we don't want some new kind of classes, we want to simplify definition of ordinary classes. We can call the module that defines the utilities classtools:

from classtools import make_class, methods, field

@methods(hash=True, cmp=True)
class Point:
    x: int
    y: int = 0
    labels: List[str] = field(factory=list)
ericvsmith commented 7 years ago

I'm not wild about plain or easy.

Frankly, I think "attr.s" and "attr.ib" are genius, but I realize they're taken and considered too cute by some. But we could use "attrs" itself!

I'm not seriously suggesting this because it would be maximally confusing. But maybe something else connoting attributes, fields, items, etc.

hynek commented 7 years ago

FWIW @attrs is the serious business alias in attrs. :)

Well yeah, turns out we did put some thought into our names... :)


Is fields a common idiom? It seems to be in this pep already? (I'm on my phone sorry) @auto_fields or similar would be an option. As I've said before: I'd prefer if the decorator/naming didn't talk about the class but about the attributes it implements.

ilevkivskyi commented 7 years ago

In some sense I like the term field even more than attribute, since method is also an attribute (in the context of __getattribute__). Although the term field is probably not standard. Still, it looks very natural:

from classtools import make_class, field, auto_fields

@auto_fields(hash=True, cmp=True)
class Point:
    x: int
    y: int = 0
    labels: List[str] = field(factory=list)
gvanrossum commented 7 years ago

We need to stop the bikeshedding.

The more I hear Hynek's enthusiasm for his naming of attr.s the less I trust his instincts about naming.

I think dataclasses is still the best name I've heard so far, so let's please stick with that. The argument against it seems to basically boil down to "it makes you think it's a special kind of class" which, actually, I think is totally okay -- dataclasses have a secret handshake (__dataclasses_fields__) and automatically generate a constructor that makes it easy to construct an instance from its field values, while the generated __repr__ shows all those fields (etc.).

TBH I'm not sure I'll use dataclasses a lot myself -- my classes usually have a lot more state than I want to pass in to the constructor, and I often have a custom repr() that compactly shows the most important state I care about in a typical debugging session. I also don't feel I am reluctant to create small classes when they fit in the design.

ilevkivskyi commented 7 years ago

OK, let them be "dataclasses".

I think it is not super important how this things will be called in the PEP/in the docs. I remember a long discussion about NewType, at the end it was decided to call them "distinct types", but FWIW I have never heard anyone to use this term, everyone calls them "newtypes" :-)

It is a bit more important what will be the actual name of the decorator, and in this sense:

@dataclass
class C:
    x: int

looks like tautology - "class" appears there twice.

warsaw commented 7 years ago

On Jun 06, 2017, at 09:46 AM, Guido van Rossum wrote:

We need to stop the bikeshedding.

Happy to do so; dataclasses are fine with me. Let's be clear about this decision in the upcoming PEP though, otherwise we'll just go through another round of endless bikeshedding at that point.

hynek commented 7 years ago

I'd like to add for protocol that I've expressed enthusiasm for @attr.s (whose history is...complicated) exactly zero times.

ericvsmith commented 7 years ago

It is a bit more important what will be the actual name of the decorator, and in this sense:

@dataclass
class C:
    x: int

looks like tautology - "class" appears there twice.

I typo this all the time as:

@dataclass C:
    x: int

because I've just typed class, so the class name must come next. So I'm all for a different name. Maybe just @data would read okay:

@data
class C:
    x: int

Although importing data from a module will likely lead to a conflict.

gvanrossum commented 7 years ago

I like @dataclass just fine -- but the module should be named dataclasses (and the PEP titled "Data Classes").

If @dataclass is really confusing, how about @with_data? (That's formed simila to @six.with_metaclass.)

warsaw commented 7 years ago

Blue. No yel-- Auuuuuuuugh!

I like @data a bit more than @with_data just because it's easier to type. I'll bring back my previous suggestion in a different context: what about @declare?

ericvsmith commented 7 years ago

I'm going to close this issue. For the purpose of the PEP and reference implementation, the module will be dataclasses, the decorator will be dataclass, and the PEP will refer to them as Data Classes.

The bike shedding can continue in the appropriate venue once the PEP is completed.

ericvsmith commented 7 years ago

I'm not going to re-open this issue, but I thought I'd post the following here, lacking a better place to record it.

@larryhastings sent me this link: http://cr.openjdk.java.net/~briangoetz/amber/datum.html, titled "Data Classes for Java". Someone in the Java community proposes a similar feature to dataclass, The syntax would be:

__data class Foo(int x, int y) { ... }

You could add additional methods where the ... are. In addition to "Data classes", the document also uses the phrase "plain data carriers".

Edited to add this, from the link: "Other OO languages have explored syntactic forms for more compact class declaration: case classes in Scala, data classes in Kotlin, and soon, record classes in C#."

Ricyteach commented 6 years ago

Hello: I know the first beta release of 3.7 is coming any day now so it may be a little late to make a change, but when I read PEP 557 yesterday I instantly thought that this concept should be named a fieldclass rather than dataclass. Since it is already using the concept of fields etc, it just seems natural.

Just an idea.

gvanrossum commented 6 years ago

... this concept should be named a fieldclass rather than dataclass.

No.