ericvsmith / dataclasses

Apache License 2.0
587 stars 53 forks source link

What API for make_class()? #5

Closed ericvsmith closed 7 years ago

ericvsmith commented 7 years ago

make_class is a way to dynamically create classes, similar in spirit to collections.namedtuple or attr.make_class.

The question is, what should the API be for specifying fields?

For 'normal' usage, a field requires 3 pieces of information, one of which is optional: name, type, default value. The default value can be overridden with a field() call to specify not only the default value, but additional per-field attributes (use in hash, use in rear, etc.).

For make_class, I propose the per-field API be similar: 2 required items (name, type) and one optional one, the default or field().

So, something like:

C = make_class('C',
               [('a', int),
                ('x', float, 1001.0),
                ('b', str, field(default=1000, repr=False),
               ],
               bases=(B1, B2))

Which would be the dynamic equivalent of the static:

@dataclass
class C(B1, B2):
    a: int
    x: float = 1001.0
    b: str = field(default=1000, repr=False)

I realize the dynamic make_class call is somewhat unsatisfying, but I don't expect it to get used at all with statically specified values. My use case is something like reading database schemas and generating a dataclass that will be the return value for each row. In that case, the list of fields would be generated in code after reading the schema.

Any suggestions for improvements here? One thought would be that instead of using a 2- or 3-tuple for each field, have another class that would represent them. I'm not sure if that's worth the hassle, though.

Input is welcomed.

gvanrossum commented 7 years ago

This sounds reasonable -- or you could extend the field() thing to take an optional name and type so you could call

C = make_class('C', [
    field('a', int),
    field('x', float, default=1001.0),
    field('b', str, default=1000, repr=False),
])
ericvsmith commented 7 years ago

Yes, I thought of that, and actually had it working. The downside is that the name and type can't be used in the "normal" usage:

@dataclass
class C:
    x: int = field(default=1000, repr=False)
    y: int = field('z', str, default="")   # <-- error: can't specify name or type

I think if we did this it would be confusing. So that leads me to think we need a different function with the 2 additional parameters (name and type). Call it dynfieldA for argument's sake:

C = make_class('C', [
    dynfieldA('a', int),
    dynfieldA('x', float, default=1001.0),
    dynfieldA('b', str, default=1000, repr=False),
])

Or, another function and re-use field(). Call that function dynfieldB for argument's sake:

C = make_class('C', [
    dynfieldB('a', int),
    dynfieldB('x', float, field(default=1001.0)),
    dynfieldB('b', str, field(default=1000, repr=False)),
])

Or, the tuple idea above.

I think I prefer the tuple version, followed by the "dynfieldA" approach, mainly because the tuple version doesn't require a new function name, and the dynfieldA approach is shorter than dynfieldB.

gvanrossum commented 7 years ago

I slightly prefer the dynfieldA version over plain tuples, since (speaking of strict typing :-) with tuples, the argument to make_class() would have to have type List[Union[Tuple[str, type], Tuple[str, type, FieldThing]]]. With dynfieldA it can be just List[DynFieldThing].

ilevkivskyi commented 7 years ago

I also like dynfieldA most since the usage of keyword arguments seems more "uniform" (they don't need to be wrapped in field).

ericvsmith commented 7 years ago

Okay, I've gone with the first option, and called it "dynfield" for now:

    C = make_class('C',
                   [dynfield('x', int, default=10),
                    dynfield('y', int, default=20),
                    ])

I figure there will be a renaming effort for dataclass, field, and dynfield before we're done.

The code that implements this is now checked in and marked with the v1 tag.

gvanrossum commented 7 years ago

I'm still not sure I understand the reason why dynfield can't be field. You wrote "I think if we did this it would be confusing" but I'm not sure why. We could just have a rule saying that inside a class statement field doesn't allow name and type args, and we could enforce that at runtime (e.g. in the class decorator) as well as in the static checker.

ericvsmith commented 7 years ago

I think it would be confusing in the sense that in 99% of the use cases, there would be two parameters that could not be specified (those being the cases where fields are defined statically in a class definition). But I grant you that we could detect this and raise an error. And it would also be nice to have only one "field" type.

What would the signature of field be?

def field( *, name=None, type=None, default=_MISSING, repr=True, hash=True, init=True, cmp=True):

or

def field(name=None, type=None, *, default=_MISSING, repr=True, hash=True, init=True, cmp=True):

or something else?

gvanrossum commented 7 years ago

I prefer the latter. The confusion would not be great if people just followed the examples given in the docs (or found elsewhere in working code). We could explain the situation further in the docs.

ericvsmith commented 7 years ago

Okay, that makes sense. I've implemented it.

@dataclass
class C:
    x: int
    y: str = field(init=False, default=None)
    z: str = field(repr=False)

and the equivalent:

C = make_class('C',
               [field('x', int),
                field('y', str, init=False, default=None),
                field('z', str, repr=False),
                ])

With the appropriate error checking.