ericvsmith / dataclasses

Apache License 2.0
587 stars 53 forks source link

Add immutable=False parameter to dataclass #59

Closed johnthagen closed 7 years ago

johnthagen commented 7 years ago

If someone wants to create a data class in which all instances are immutable (i.e. each attribute can not be changed after construction), I propose that a immutable parameter be added (which in the spirit of Python defaults to False). Note this is different than frozen, which applies to monkey patching new attributes.

Currently, this can be done manually with normal classes with a lot of boilerplate and the use of @property. In other languages, such as Kotlin, data classes are immutable by default.

A sketch of this proposal would be as follows:

@dataclass(immutable=True)
class InventoryItem:
    name: str
    unit_price: float
    quantity_on_hand: int = 0

    def total_cost(self) -> float:
        return self.unit_price * self.quantity_on_hand

Would desugar into something like:

def __init__(self, name: str, unit_price: float, quantity_on_hand: int = 0) -> None:
    self._name = name
    self._unit_price = unit_price
    self._quantity_on_hand = quantity_on_hand

@property
def name(self) -> str:
    return self._name

@property
def unit_price(self) -> float:
    return self._unit_price

@property
def quantity_on_hand(self) -> int:
    return self._quantity_on_hand

If one attempts you modify a property, an AttributeError is raised. IDEs can lint for this kind of thing while the user types before runtime. PyCharm, for example, squiggles a warning if you try to set a property.

gvanrossum commented 7 years ago

I like this idea in principle; it's a good example of boilerplate generation that the library can do for you.

It looks like there's a flaw in the implementation sketch though -- it seems it would happily translate e.g.

@dataclass(immutable=True)
class FakeNews:
    news: List[str]

into a class like

class FakeNews:
    def __init__(self, news: List[str]) -> None:
        self._news = news
    @property
    def news(self) -> List[str]: return self._news

but this would not be immutable by the definition that's typically used. (E.g. a FakeNews item could not be used as a dict key, since it's not hashable.)

johnthagen commented 7 years ago

Hmm, good point. This feels somewhat like "interior mutability vs exterior mutability" from my time in Rust. In this case, you still get the protection that your FakeNews instances will always point to the same news list, even though that list could be modified.

I'm curious, if FakeNews took a tuple, would it be hashable?

Would a different (weaker) parameter name help with this? I feel like immutable makes it clear the intent, but do agree that in Python it's difficult to satisfy in a pure sense. I suppose it's possible we could disallow all mutable types from being used as attributes in immutable=True data classes. But then we might have to recurse into user defined types and it'd probably get hairy quickly.

I'd still find a very nice use for the non-pure "immutable" data types, personally. The amount of boilerplate currently needed for it generally pushes me away from it, which is unfortunate.

drhagen commented 7 years ago

When I was programming in Scala, we usually referred to case classes (Scala's term for data classes) as "immutable", just not "fully immutable", if they had members that were mutable objects because you couldn't reassign the members. I wonder if its a C-world vs Java-world distinction.

An immutable dataclass could generate a hash code automatically:

def __hash__(self):
    return hash((self._name, self._unit_price, self._quantity_on_hand))

This would correctly fail if any of the members were not hashable.

gvanrossum commented 7 years ago

TBH how does this differ from frozen=True on the class?

johnthagen commented 7 years ago

frozen=True prevents add new attributes after construction:

my_item = InventoryItem(name='pizza', unit_price=6.99, quantity_on_hand=5)
my_item.absurb_headline = "Python attacks!"  # Can't add new attributes

immutable=True prevents attributes from being modified after construction:

my_item = InventoryItem(name='pizza', unit_price=6.99, quantity_on_hand=5)
my_item.name = "Guido"  # AttributeError thrown. Guido is not for sale.

Both are important, but orthogonal. If I were using data classes, I'd often set both to True.

gvanrossum commented 7 years ago

IIUC frozen=True also prevents mutating existing attributes, and the effect it has on new attributes is incidental.

ericvsmith commented 7 years ago

@gvanrossum : as far as "immutable", I don't see this case as different from:

>>> t = (1, [], 3)
>>> t[1].append(2)
>>> t
(1, [2], 3)
>>> {t:0}
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'

I think everyone would agree that a tuple is immutable.

I'm not sure I see much difference between the existing frozen=True and the proposed immutable=True (except maybe a performance difference). The intent of frozen=True is to disallow you from assigning to instance fields. If it also prevents you from assigning to non-field attributes (or creating new non-fields), that's okay with me.

johnthagen commented 7 years ago

@ericvsmith You're correct, frozen=True already does this. It wasn't clear to me from the PEP that this was how it was designed to work (perhaps a short example would help other readers?)

I installed dataclasses (0.1) from PyPI with pip on Python 3.6.

And ran this code:

from dataclasses import dataclass

@dataclass(frozen=True)
class Pizza:
    name: str

def main() -> None:
    p = Pizza(name='pizzzza')
    print(p.name)

    p.name = 'new'
    print(p.name)

if __name__ == '__main__':
    main()

Correctly throws: dataclasses.FrozenInstanceError: cannot assign to field 'name'

No squiggles in PyCharm yet, but I'm sure they'll teach it about dataclasses when it's official.

frozen=True does do what I had wanted. Thanks for the explanation.