ericvsmith / dataclasses

Apache License 2.0
587 stars 53 forks source link

Message from Nick on the python-ideas thread #1

Closed gvanrossum closed 7 years ago

gvanrossum commented 7 years ago

Quoting @ncoghlan on python-ideas:

Some of the key problems I personally see are that attrs reuses a general noun (attributes) rather than using other words that are more evocative of the "data record" use case, and many of the parameter names are about "How attrs work" and "How Python magic methods work" rather than "Behaviours I would like this class to have".

That's fine for someone that's already comfortable writing those behaviours by hand and just wants to automate the boilerplate away (which is exactly the problem that attrs was written to solve), but it's significantly more problematic once we assume people will be using a feature like this before learning how to write out all the corresponding boilerplate themselves (which is the key additional complication that a language level version of this will have to account for).

However, consider instead the following API sketch:

from autoclass import data_record, data_field

@data_record(orderable=False, hashable=False)
class SvgTransform(SvgPicture):
    child = data_field()
    matrix = data_field(setter=numpy.asarray)

Here, the core concepts to be learned would be:

  • the "autoclass" module lets you ask the interpreter to automatically fill in class details
  • SvgTransform is a data record that cannot be hashed, and cannot be ordered
  • it is a Python class inheriting from SvgPicture
  • it has two defined fields, child & matrix
  • we know "child" is an ordinary read/write instance attribute
  • we know "matrix" is a property, using numpy.asarray as its setter

In this particular API sketch, data_record is just a class decorator factory, and data_field is a declarative helper type for use with that factory, so if you wanted to factor out particular combinations, you'd just write ordinary helper functions.

Instead of trying to cover every possible use-case from a single decorator with a multitude of keyword arguments, I think covering the simple cases is enough. Explicitly overriding methods is not a bad thing! It is much more comprehensible to see an explicit class with methods than a decorator with multiple keyword arguments and callbacks.

This isn't the case for folks that have to actually read dunder methods to find out what a class does, thought. Reading an imperatively defined class only works that way once you're able to mentally pattern match "Oh, that's a conventional init, that's a conventional repr, that's a conventional hash, that's a conventional eq, that's a conventional lt implementation, etc, etc".

Right now, telling Python "I want to do the same stock-standard things that everyone always does" means writing a lot of repetitive logic (or, more likely, copying the logic from an existing class that you or someone else wrote, and editing it to fit).

The idea behind offering some form of declarative class definitions is to build out a vocabulary of conventional class behaviours, and make that vocabulary executable such that folks can use it to write applications even if they haven't learned how it works under the hood yet. As with descriptors before it, that vocabulary may also take advantage of the fact that Python offers first class functions to allow callbacks and transformation functions to be injected at various steps in the process without requiring you to also spell out all the other steps in the process that you don't want to alter.

I like the namedtuple approach: I think it hits the sweet spot between "having to do everything by hand" and "everything is magical".

It's certainly a lot better than nothing at all, but it brings a lot of baggage with it due to the fact that it is a tuple. Declarative class definitions aim to offer the convenience of namedtuple definitions, without the complications that arise from the "it's a tuple with some additional metadata and behaviours" aspects.

Database object-relational-mapping layers like those in SQL Alchemy and Django would be the most famous precursors for this, but there are also things like Django Form definitions, and APIs like JSL (which uses Python classes to declaratively define JSON Schema documents).

For folks already familiar with ORMs, declarative classes are just a matter of making in memory data structures as easy to work with as database backed ones. For folks that aren't familiar with ORMs yet, then declarative classes provide a potentially smoother learning curve, since the "declarative class" aspects can be better separated from the "object-relational mapping" aspects.

ericvsmith commented 7 years ago

I think @ncoghlan's major point here is that we should have arguments that describe how the class will be used, not how it will be implemented.

But, I don't see a lot of precedence for this in core Python types, so I think we should stick with the attrs-like description of what features you're asking for in the generated class, not how you plan to use it. For example, if you want to add your own __repr__, you'd set repr=False. There's a direct correspondence with parameter names and Python features. I don't think hiding Python features serves the users.

For example, what would slots=False become? You'd need some parameter or combination of parameters that means you want to have instances where you're able to add dynamic attributes that are non-fields, and you're okay with the increased per-instance memory usage.

ncoghlan commented 7 years ago

I think there are some features (like slots) that should be exposed directly since they're unique to Python and don't correspond with any general characteristics of data type definitions, but I do think there are others (like orderable and hashable) where a higher level formulation would make sense.

The latter would only be appropriate in cases where such a higher level formulation already appears somewhere in the standard library, as is the case with collections.abc.Hashable and functools.total_ordering.

A couple of other characteristics that may fail into that "higher level" category would be to support copying and pickling objects by default and then have a boolean toggle to turn that off at the class level:

ericvsmith commented 7 years ago

Since the PEP and reference implementation are finalizing, I'm going to close this. We can continue the discussion in the appropriate venue once the PEP is published.