Closed hynek closed 7 years ago
What do you think about calling them "autoclasses"? With something like:
from autoclass import auto
@auto
class C:
x: int
y: int = 0
...
Yes I love it! I was actually thinking of that myself just now based on Nick's mail. :)
I really have no preference on the name. I'll let others decide whether autoclass
conveys a different meaning from dataclass
.
I do appreciate the "regular classes" feel of attrs.
I disagree with Hynek that calling these "data classes" sends the wrong message. The things the proposed decorator adds to a class all have to do with the "data" aspect of the class: how to initialize the instance variables, how to print (repr) them, how to compare/hash them. None of this precludes that the class may also have methods. (In fact the ultimate read-only data class, NamedTuple, allows adding methods in Python 3.6.
To the contrary, I think "auto" is a terrible name, because it doesn't specify what is automatic here.
OK I can see the argument about “auto” being unclear and I really don’t insist on it, it was just the best thing I saw so far.
I cannot follow the data argument though. Yes it’s about data, but everything is data in the end, including code (let’s not have a lisp conversation tho :)). And every class should carry some data in the end, otherwise it’s rather a module…
So let’s take a step back an loop what it is about more specifically.
It’s about attributes and the dunder boilerplate involved, right? (which btw makes attrs a really good name for attrs IMHO. ;))
So even something colloquial and heavy handed like @auto_dunders
that talks about what is done would be preferable to me than a @dataclass
which ostensibly talks about the class which it IMHO shouldn’t.
We should always stress that the class at the end is a 100% regular class with added code that operates on the defined attributes.
Am I making any sense to y’all at all?
Just to be clear why I care so much: given the feedback on MLs and how attrs stickers have been ripped our of my hands at PyCon, I think this PEP could be a release game changer somewhere between f-strings and async/await. I can see people push for 3.7 because of this feature like they did for 3.5 and 3.6. And naming is important – in the end it’s marketing and while Python 3 may be over the hump in general, every bit helps. And I know that naming is hard – I’ve made a few sins myself.
@gvanrossum
I disagree with Hynek that calling these "data classes" sends the wrong message.
It may be personal, but to me "data class" sounds like a different/particular kind of classes. Also some other languages have similar terminology with different meanings.
I think "auto" is a terrible name, because it doesn't specify what is automatic here.
This depends on the API. I could imagine three options:
from autoclass import auto
@auto(hash=True, cmp=True)
class C:
x: int
Enum
:
from autoclass import auto, HASH, CMP
@auto(HASH, CMP)
class C:
x: int
from autoclass import Hash, Compare
class C(Hash, Compare):
x: int
The last API should be probably based on __init_subclass__
under the hood, since we don't want metaclass conflicts.
What do you think about calling them "declared classes"?
What do you think about calling them "declared classes"?
Sounds good! I am not sure what is better "declared" or "auto", but both are better than "data classes" I think.
I agree that a good name is important, and I'm still open for suggestions, but I'm still not convinced that any of the alternatives proposed so far are better than "data class".
Does the proposed feature introduce a new category of classes? I think it does -- the decorator stores additional field information in a class attribute (currently __dataclass_fields__
, referenced as _MARKER
in the code). Sure, the new category is pretty compatible with other kinds of classes, but the same can be said for e.g. the latest version of typing.NamedTuple
-- this uses a similar notation but essentially makes all fields slots, while still allowing you to define methods. (In fact, apart from the a tuple interface, NamedTuple
is almost the same as dataclass
with slots=True
...)
The dynamic generation feature (make_class()
in the current code) also smells as creating something that has just data. (Yes, you can subclass it, but the same is true for a dynamically generated NamedTuple
.)
In terms of what the interface should look like (Ivan's three bullets) I think the class decorator is a hands-down winner, because it doesn't use inheritance or metaclasses (both of which have been troublesome when there's another base class). The enum-based call signature looks weird, mostly because that's not a common idiom in other parts of Python, whereas keyword flags are very common.
What do people think of "easy classes"?
@gvanrossum
What do people think of "easy classes"?
Actually "easy" is probably even better than "auto" IMO ("auto" is quite boring plus you don't like it). This looks interesting:
from easyclass import easy
@easy(hash=True, cmp=True)
class C:
x: int
I'm not sure I like "easy" although I'm finding it difficult to articulate why.
Taking a trip through the thesaurus I came across "express" and I kind of like that for its double meaning, both of which I think apply here. "Express" as in you're communicating the essential bits of your class and leaving the machinery to do the rest, but also "express" as in you're taking the quick route to defining your class.
Is it too cute or obscure?
from expressclass import express
@express(hash=True, cmp=True)
class C:
x: int
Not the biggest fan of easy either; it has a negative connotation and makes them sound like "classes light".
One adjective that Guido actually used at PyCon was "plain" and that makes more sense to me. (We talked of POPOs which means BUTTs in German so let's not go there ;))
@plain
class C:
x: int
Speaks to me better than easy because it doesn't carry judgement and just says "this class does what you'd expect". And you literally read "plain class" in your head.
There is one more option: don't use any special name for these classes, just call them classes. Ultimately, we don't want some new kind of classes, we want to simplify definition of ordinary classes. We can call the module that defines the utilities classtools
:
from classtools import make_class, methods, field
@methods(hash=True, cmp=True)
class Point:
x: int
y: int = 0
labels: List[str] = field(factory=list)
I'm not wild about plain or easy.
Frankly, I think "attr.s" and "attr.ib" are genius, but I realize they're taken and considered too cute by some. But we could use "attrs" itself!
I'm not seriously suggesting this because it would be maximally confusing. But maybe something else connoting attributes, fields, items, etc.
FWIW @attrs
is the serious business alias in attrs. :)
Well yeah, turns out we did put some thought into our names... :)
Is fields a common idiom? It seems to be in this pep already? (I'm on my phone sorry) @auto_fields
or similar would be an option. As I've said before: I'd prefer if the decorator/naming didn't talk about the class but about the attributes it implements.
In some sense I like the term field
even more than attribute, since method is also an attribute (in the context of __getattribute__
). Although the term field
is probably not standard. Still, it looks very natural:
from classtools import make_class, field, auto_fields
@auto_fields(hash=True, cmp=True)
class Point:
x: int
y: int = 0
labels: List[str] = field(factory=list)
We need to stop the bikeshedding.
The more I hear Hynek's enthusiasm for his naming of attr.s the less I trust his instincts about naming.
I think dataclasses is still the best name I've heard so far, so let's please stick with that. The argument against it seems to basically boil down to "it makes you think it's a special kind of class" which, actually, I think is totally okay -- dataclasses have a secret handshake (__dataclasses_fields__
) and automatically generate a constructor that makes it easy to construct an instance from its field values, while the generated __repr__
shows all those fields (etc.).
TBH I'm not sure I'll use dataclasses a lot myself -- my classes usually have a lot more state than I want to pass in to the constructor, and I often have a custom repr() that compactly shows the most important state I care about in a typical debugging session. I also don't feel I am reluctant to create small classes when they fit in the design.
OK, let them be "dataclasses".
I think it is not super important how this things will be called in the PEP/in the docs. I remember a long discussion about NewType
, at the end it was decided to call them "distinct types", but FWIW I have never heard anyone to use this term, everyone calls them "newtypes" :-)
It is a bit more important what will be the actual name of the decorator, and in this sense:
@dataclass
class C:
x: int
looks like tautology - "class" appears there twice.
On Jun 06, 2017, at 09:46 AM, Guido van Rossum wrote:
We need to stop the bikeshedding.
Happy to do so; dataclasses are fine with me. Let's be clear about this decision in the upcoming PEP though, otherwise we'll just go through another round of endless bikeshedding at that point.
I'd like to add for protocol that I've expressed enthusiasm for @attr.s
(whose history is...complicated) exactly zero times.
It is a bit more important what will be the actual name of the decorator, and in this sense:
@dataclass class C: x: int
looks like tautology - "class" appears there twice.
I typo this all the time as:
@dataclass C:
x: int
because I've just typed class
, so the class name must come next. So I'm all for a different name. Maybe just @data
would read okay:
@data
class C:
x: int
Although importing data
from a module will likely lead to a conflict.
I like @dataclass
just fine -- but the module should be named dataclasses
(and the PEP titled "Data Classes").
If @dataclass
is really confusing, how about @with_data
? (That's formed simila to @six.with_metaclass
.)
Blue. No yel-- Auuuuuuuugh!
I like @data
a bit more than @with_data
just because it's easier to type. I'll bring back my previous suggestion in a different context: what about @declare
?
I'm going to close this issue. For the purpose of the PEP and reference implementation, the module will be dataclasses, the decorator will be dataclass, and the PEP will refer to them as Data Classes.
The bike shedding can continue in the appropriate venue once the PEP is completed.
I'm not going to re-open this issue, but I thought I'd post the following here, lacking a better place to record it.
@larryhastings sent me this link: http://cr.openjdk.java.net/~briangoetz/amber/datum.html, titled "Data Classes for Java". Someone in the Java community proposes a similar feature to dataclass, The syntax would be:
__data class Foo(int x, int y) { ... }
You could add additional methods where the ...
are.
In addition to "Data classes", the document also uses the phrase "plain data carriers".
Edited to add this, from the link:
"Other OO languages have explored syntactic forms for more compact class declaration: case
classes in Scala, data
classes in Kotlin, and soon, record
classes in C#."
Hello: I know the first beta release of 3.7 is coming any day now so it may be a little late to make a change, but when I read PEP 557 yesterday I instantly thought that this concept should be named a fieldclass
rather than dataclass
. Since it is already using the concept of fields etc, it just seems natural.
Just an idea.
... this concept should be named a
fieldclass
rather thandataclass
.
No.
As I’ve already mentioned by e-mail, I’m strongly opposed to call this concept “data classes”.
Having an easy way to define many small class with attributes is nothing about data, it’s about good OO design.
Calling it “data classes” implies that they differ from…“code classes” I guess?
One of the things people love about attrs is that it’s helping them to write regular classes which they can add methods to without any subclassing or other magic. IOW: to focus on the actual code they want to write as opposed to generic boilerplate.
Debasing them by name seems like a poor start to me. We do have data containers in the stdlib (namedtuples, SimpleNamespace) so I don’t see a reason to add a third to the family – even if just by name.