Open pipermerriam opened 7 years ago
Actually, overwriting a parent's fields attribute in a rlp.Serializable does not work currently: #45
This issue now has a funding of 0.4 ETH (413.69 USD) attached to it.
If you are interested in claiming this bounty please submit a proposal here on this issue as well as asking any questions you may have prior to claiming the bounty. I'd like to avoid:
Brute force way for starters, unless I am misunderstanding the requirements:
class sedesBaseClass:
pass # Whatever base class all fields have in common
class rlpSerializeable:
# Generate fields on demand
@property
def fields(self):
# I know there's a way to make this method a tuple generator
ls = []
for field in dir(self):
if field is 'fields':
continue # Avoid infinite recursion
val = getattr(self, field)
if isinstance(val, sedesBaseClass):
ls.append((field, val))
return tuple(ls)
# or whatever function serializes the object
def serialize(self):
return self.fields
class A(rlpSerializeable):
field_a = sedesBaseClass()
class B(A):
field_b = sedesBaseClass()
class C(A):
field_c = sedesBaseClass()
if __name__ == '__main__':
a_fields = A().serialize()
assert a_fields[0][0] == 'field_a'
b_fields = B().serialize()
assert b_fields[0][0] == 'field_a'
assert b_fields[1][0] == 'field_b'
c_fields = C().serialize()
assert c_fields[0][0] == 'field_a'
assert c_fields[1][0] == 'field_c'
print("Works!")
@fubuloubu
đź‘Ť for including backwards compatability with the self.fields
. I'm going to add this as a requirement to the issue.
However, I believe this fails at the class level. A.fields
would not return accurate information since the fields
property will only be populated on instances of A
. The only solution I know of for this is to use metaclasses but I'm open to alternative ideas if you have them.
Can you provide a counter example that shows that? I'm not sure I understand what you mean.
Do these classes dynamically add/remove fields? Also, wouldn't it be able to get these fields at the classes level because they're defined as class variables? I'm not too familiar with Django to know the idea you described offhand.
Actually, while working up an example to show you, it exposed some even deeper problems with your approach (hint, the isinstance(value, sedesBaseClass)
doesn't work. I'd suggest spending some time exporing the problem and coming back. I'm low on time right now, otherwise I'd give you a write-up.
Another issue: RLP is order-dependent. What is the right order? I assume it's that the super-class fields all come first, in definitional order. Then subclasses, in defined-order. Can we even retrieve attributes in definitional order?
dir()
returns fields in alphabetical order. From help(dir)
:
If called without an argument, return the names in the current scope. Else, return an alphabetized list of names comprising (some of) the attributes of the given object, and of attributes reachable from it.
Haha, me too. I just looked at the issue at the brute force way seemed like a good enough start. Someone else can build off this if don't get a chance to revisit.
@carver looking at the docs for python3, it looks like metaclasses can maintain field ordering.
import collections
class OrderedClass(type):
@classmethod
def __prepare__(metacls, name, bases, **kwds):
return collections.OrderedDict()
def __new__(cls, name, bases, namespace, **kwds):
result = type.__new__(cls, name, bases, dict(namespace))
result.members = tuple(namespace.keys())
result.base_members = [tuple(base.members) for base in bases if hasattr(base, 'members')]
return result
class A(metaclass=OrderedClass):
def one(self): pass
def two(self): pass
def three(self): pass
def four(self): pass
class B(A):
def five(self): pass
>>> A.members
('__module__', '__qualname__', 'one', 'two', 'three', 'four')
>>> A.base_members
[]
>>> B.members
('__module__', '__qualname__', 'five')
>>> B.base_members
[('__module__', '__qualname__', 'one', 'two', 'three', 'four')]
With that information, we would be able to maintain field ordering within a class, as well as treating new fields in subclasses as they should be appended by default with cls.__mro__
for ordering of base classes. That makes our default behavior:
That gets us most of the way there. I think the only thing we need to allow full control over field ordering is a way to dictate where in the previous class's fields a new field should go assuming they don't want to use the default ordering. Two options come to mind for dealing with this and we might want to do both (or some other better idea).
before
and after
keyword arguments for the Field
class.The Field
class could take one of the keyword arguments before
or after
which would specify a the position that the new field should be placed in the field order.
class A:
one = Field()
two = Field()
four = Field()
class B(A):
three = Field(after='two')
# three = Field(before='four') # using before
# three = Field(order='after:two') # or maybe using a mini DSL
B.fields = ('one', 'two', 'three', 'four')
Borrowed heavily from django, we could support something like the following for explicitely declaring field ordering.
class A:
one = Field()
two = Field()
four = Field()
class B(A):
three = Field()
class Meta:
field_order = ('one', 'two', 'three', 'four')
B.fields = ('one', 'two', 'three', 'four')
Fixed the brute force case so that it recurses in Inheritance order. This method only works if there is not dynamic modification of the fields, which I think is true based on using class variables
class sedesClass:
pass # Whatever class(es) needed
def get_fields(cls):
fields = []
if cls is rlpSerializeable:
return tuple() # End recursion
for parent in cls.__bases__:
fields.extend(get_fields(parent))
# Fields are either empty tuple or tuple of tuples
if len(cls._fields) > 0 and isinstance(cls._fields[0], tuple):
fields.extend(list(cls._fields))
else:
fields.append(cls._fields)
return fields
class rlpSerializeable:
@property
def fields(self):
return get_fields(self.__class__)
# or whatever function serializes the object
def serialize(self):
return tuple(self.fields)
class A(rlpSerializeable):
_fields = (
('field_a', sedesClass)
)
class B(A):
_fields = (
('field_b', sedesClass)
)
class C(A):
_fields = (
('c_field', sedesClass),
('another_field', sedesClass)
)
class D(C):
pass # No fields
if __name__ == '__main__':
a_fields = A().serialize()
assert a_fields[0][0] == 'field_a'
b_fields = B().serialize()
assert b_fields[0][0] == 'field_a'
assert b_fields[1][0] == 'field_b'
c_fields = C().serialize()
assert c_fields[0][0] == 'field_a'
assert c_fields[1][0] == 'c_field'
assert c_fields[2][0] == 'another_field'
d_fields = D().serialize()
assert d_fields[0][0] == 'field_a'
assert d_fields[1][0] == 'c_field'
assert d_fields[2][0] == 'another_field'
print("Works!")
looking at the docs for python3, it looks like metaclasses can maintain field ordering.
Hah, cool that this is literally the textbook example for metaclasses.
Support before and after keyword arguments for the Field class.
I'm less keen on this style, because the ordering information doesn't semantically belong in the sedes Field
.
Support meta params ... explicitely declaring field ordering.
This seems cleaner and more explicit, with a bonus of being familiar to anyone with Django experience.
To summarize:
Alternative proposal: Keep the fields
attribute. Make fields
of the parent class available in the class body of the child. Example usage:
class Parent(rlp.Serializable):
fields = [
('a', big_endian_int),
('b', big_endian_int),
]
class Child1(Parent):
""""Appends a field"""
fields = parent_fields + [
('c', big_endian_int)
]
class Child2(Parent):
"""Prepends a field"""
fields = [
('z', big_endian_int)
] + parent_fields
class Child3(Parent):
""""Replaces all fields""""
fields = [
('x', big_endian_int)
]
To make creating fields
easier for more complex changes, one could add helper functions:
class Child4(Parent):
""""Replaces some field"""
fields = replace_fields(parent_fields, [
('a', binary)
])
class Child5(Parent):
""""Inserts a field.""""
fields = insert_fields_after(parent_fields, 'a', [
('a2', big_endian_int),
])
class Child6(Parent):
""""Remove a field."""
fields = remove_fields(parent_fields, [
'a',
]
Additionally, one could replace by setting an additional field:
class Child6(Parent):
""""Replace some fields.""""
replaced_fields = [
('a', binary)
]
If we turn fields
from a list of tuples to an OrderedDict
cytoolz
or similar libraries might already have helpers like replace_fields
or insert_fields_after
.
Implementation would be straightforward:
class SerializableMeta(type):
@classmethod
def __prepare__(metacls, name, bases):
namespace = {}
# look for `fields` in bases (if there are multiple, either raise an error or take the first)
# if found set `namespace['parent_fields'] = bases[0].fields`
return namespace
def __new__(cls, name, bases, namespace):
# look for `'replaced_fields'` in namespace
# if found replace fields in `namespace['fields']` accordingly
return result
Obviously, this doesn't use Django-style field properties. So if that's a hard requirement, this approach wouldn't work. I personally prefer the old style though (because I find it more explicit and clearer), but I'm probably biased in that regard.
Obviously, this doesn't use Django-style field properties....
My intent with this idea was to make overriding existing fields easy. I didn't at the time consider the field ordering issue which makes the solution less perfect, however, I think that the solutions that are currently on the table to manage ordering are good-enough.
I also think that it'd be fine to do something the way you propose with some helpers. I don't like the solution as much because it feels less elegant but elegance isn't a hard requirement. The field class approach loses some of its elegance when futzing with field ordering comes into play so that's one mark against it, however, the helper functions that operate on the fields
attribute as you propose maintains a certain level of maybe-hard-to-read-verbosity. Neither is ideal but I'm still leaning towards django-style-fields. Curious to hear anyone else's thoughts on one vs the other.
If a second RLP renames or reorders fields of a first, then I think the second RLP is not a strict subclass of the first, and we probably shouldn't encourage treating it as one. Instead, we might do something like
COMMON_FIELDS = (
('a', big_endian_int),
('b', big_endian_int),
)
class Serial1(rlp.Serializable):
_fields = COMMON_FIELDS
class SerialChild(Serial1):
"""
Appends a field. All the parent fields are present, in the same order.
"""
_fields = (
('c', big_endian_int),
)
class Serial2(rlp.Serializable):
"""Prepends a field"""
_fields = (
('z', big_endian_int),
) + COMMON_FIELDS
class Serial3(rlp.Serializable):
""""Replaces all fields""""
_fields = (
('x', big_endian_int),
)
I agree that the Django-style approach is more elegant, especially for single RLP class definitions. Since I think single explicitly-defined RLP classes are the common case (and maybe we should encourage them to be), I prefer the Django-style approach.
class Serial1(rlp.Serializable):
a = big_endian_int
b = big_endian_int
class SerialChild(Serial1):
"""
Appends a field. All the parent fields are present, in the same order.
"""
c = big_endian_int
class Serial2(rlp.Serializable):
"""Prepends a field"""
z = big_endian_int
a = big_endian_int
b = big_endian_int
class Serial3(rlp.Serializable):
""""Replaces all fields""""
x = big_endian_int
That little bit of repetition on a
and b
in Serial2
seems worth the benefit of being to be easy to read/understand. If Serial2 is a common case, I'd love to see some real-life examples.
The more I think about it, the more I think that RPL object inheritance for the purpose of changing fields won't be a major use case. This is based on how things are setup in Py-EVM.
One option we haven't considered is requiring that either all fields be overridden in subclasses or none at all. Basically, rather than having a subtle API for ordering fields that we understand but is probably not intuitive, we can instead raise an exception during class construction if not all of the fields were overridden. This forces explicit field ordering in a manner that should make all RLP objects easy to know what the field ordering is since you just have to trace up the inheritance chain to the first object which defines any fields.
One option we haven't considered is requiring that either all fields be overridden in subclasses or none at all.
:+1: I'm in favor of this
I've updated the main issue to reflect what I believe is the current consensus for how this should be implemented.
@pipermerriam Is this one you still want to be worked? I'll resend it out on Slack / Twitter, if so
@vs77bb yes, this is an issue that would be a big nice-to-have improvement to the python RLP API.
hey @pipermerriam,
we are in the middle of a migration between smart contracts and in order to migrate away from all the old code paths, we are cleaning up some of the straggler bounties.
we're sorry this one didnt work out for ya.
do you want to kill the bounty here to recover your funds? it'll take only 30 seconds... and itll mean you get that 0.4 ETH back.
any questions, let me know. KSO
Thansk @owocki
In other news: this was halfway done with the merger of https://github.com/ethereum/pyrlp/pull/62
Next step would be to deprecate fields attribute based declaration of what the RLP fields are in favor of a basic Field
class. This should be quite trivial to do now that the metaclass is already in place.
happy to put a bounty up for ya if you want. is it worth creating a clean issue ?
@pipermerriam I was going through this issue and wanted to take a shot at it. Is this issue still up (or) has been resolved in the past? If it's still up, the whole issue at this point of time is boiled down to just Implement Inheritance by appending
right? Or is there still more to it?
I don't think it's worth prioritizing over other efforts. Maybe do it in the SSZ repo instead and we can backport the implementation to pyrlp? Probably of higher value in the SSZ repo and lower risk.
I feel that a solution using metaclass
is an overkill, and should only be used when all other options have been exhausted. I propose a new API for Serializable
classes, by using class decorators.
This is how we define a Serializable
class today:
class Base(rlp.Serializable):
fields = (
('field_a', sedes.Binary),
)
class InheritsFromBase(MyBaseObject):
fields = (
('field_a', sedes.Binary),
('another_field', sedes.Binary),
)
This is the API I propose:
@rlp.serializable
class Base:
field_a: sedes.Binary
@rlp.serializable
class InheritsFromBase(MyBaseObject):
another_field: sedes.Binary
From what I have understood so far, we have the following requirements / constraints with such classes:
If it's not obvious already from the proposed API, I intend to use dataclasses
which is a new feature in Python 3.7. Essentially, the rlp.serializable
decorator is a wrapper around dataclasses.dataclass
that rewrites the class with auto-generated __init__
and __repr__
methods, preserving the order in which they have been defined.
According to the Python docs, dataclass
looks through all of the class’s base classes in reverse MRO. This may not be the behavior that we want, which can be demonstrated by the following example:
from dataclasses import dataclass
from rlp import sedes
@dataclass
class Base:
a: sedes.Binary
b: sedes.Binary
@dataclass
class Inherited(Base):
z: sedes.Binary
a: sedes.BigEndianInt
>>> Base.__dataclass_fields__.keys()
dict_keys(['a', 'b'])
>>> Inherited.__dataclass_fields__.keys()
dict_keys(['a', 'b', 'z'])
This could be a deal breaker if:
dataclass
fields are parsed.dataclasses
have not been backported to the Python 3.6 standard library. However the original author maintains a backport for Python 3.6 as a package.
Provided that we're okay with a third-party dependency, we can solve both the caveats mentioned above, by using attrs. See example below:
import attr
from rlp import sedes
@attr.s
class Base:
a: sedes.Binary = attr.ib()
b: sedes.Binary = attr.ib()
@attr.s
class Inherited(Base):
z: sedes.Binary = attr.ib()
a: sedes.BigEndianInt = attr.ib()
>>> [each.name for each in Base.__attrs_attrs__]
['a', 'b']
>>> [each.name for each in Base.__attrs_attrs__]
['b', 'z', 'a']
I'll conclude with two questions:
This is also a good opportunity to remove some of the black magic in this file.
I'm not strictly against a decorator API, but in my opinion defining methods in a metaclass is the cleaner approach than monkey patching them in a decorator. I agree that serializable.py
is currently quite confusing, but I don't think that's inherently because of the use of metaclasses.
I don't like the way inheritance and field ordering is handled in both attrs
and dataclasses
as it's definitely not what one would expect. If we replace that, would there be enough of them left that justify the additional dependency?
defining methods in a metaclass is the cleaner approach than monkey patching them in a decorator.
In my experience, meta-programming produces code that's hard to debug, less readable, and often difficult to extend. Some of these problems are solved to a great extent by attrs
. I would like to quote from their philosophy, which is something we can try to internalise for the PyRLP project.
You define a class and attrs adds static methods to that class based on the attributes you declare. The end. It doesn’t add metaclasses. It doesn’t add classes you’ve never heard of to your inheritance tree. An attrs class in runtime is indistiguishable from a regular class: because it is a regular class with a few boilerplate-y methods attached.
Personally, I would prefer an explicit metaclass=
argument in the RLP classes, like this:
class MyBaseObject(rlp.Serializable, metaclass=Wizardy): # same as @Wizardry decorator
field_a = sedes.Binary()
over this:
class MyBaseObject(rlp.SerializableClassThatUsesMetaclassWizardy):
field_a = sedes.Binary()
because there are less number of clever things happening behind the scenes, and there is more transparency to someone extending RLP classes. The fact that an inherited class can change the very structure of the derived class seems dangerous to me.
I don't like the way inheritance and field ordering is handled in both attrs and dataclasses as it's definitely not what one would expect.
Can you please help me understand what an ideal inheritance behaviour would look like?
Can you please help me understand what an ideal inheritance behaviour would look like?
Ideally, order and set of fields is as easy as possible to define and to determine. I'm not certain that it's the ideal solution, but I like what Piper suggested above:
One option we haven't considered is requiring that either all fields be overridden in subclasses or none at all. Basically, rather than having a subtle API for ordering fields that we understand but is probably not intuitive, we can instead raise an exception during class construction if not all of the fields were overridden. This forces explicit field ordering in a manner that should make all RLP objects easy to know what the field ordering is since you just have to trace up the inheritance chain to the first object which defines any fields.
I guess this would be quite straightforward to implement in both the metaclass and the third party library approach.
My intuition is that using attrs
is likely to add performance overhead so it's worth evaluating what that overhead is because this library does have a need for high performance.
I think we could spin/bikeshed on design/approach for a while here and not really get anywhere because there's a lot of subjectivity as well as different angles to evaluate each approach. I'd advocate for you to do an MVP implementation of whatever approach you think is best (taking into consideration all of the things that have been discussed in this issue) at which point we can evaluate something more concrete.
Also, I am still reasonably strongly of the opinion that a metaclass approach is going to be adequate, maintainable, and intuitive enough.
One option we haven't considered is requiring that either all fields be overridden in subclasses or none at all.
:+1: I'm still in favor of this
What is the problem
RLP objects are not very good at inheritance.
There is no good pythonic way to extend a base class's fields.
How can we fix it.
Metaclass wizardry!
Much the same way that django models perform magical metaclass voodoo, we can do the same. Implement a metaclass for use that finds all attributes which are valid
sedes
types and does all the fancy stuff that happens within the__init__
method ofSerializable
. In general, doing away withSerializable.fields
seems like a nice cleanup.The example above would look something like:
Implementation specifics.
Field class
We need a
Field
class which we can use to define RLP fields. This should be a thin wrapper of sorts around arlp.sedes
object. Ideally, we can have some sort of thin shortcut for converting asedes
object to aField
object using something like a classmethodField.from_sedes(...)
Metaclass
Unless an alternate solution is presented, the
rlp.Serializable
class needs to be converted to use a metaclass. This metaclass will need to introspect the current fields as well as all of the fields for the base class to construct the class.I suggest this part be done in two steps.
The first step would be to have the metaclass construct an
rlp.Serializable
instance with the properfields
property, and to allow the existing internals ofrlp.Serializable
to populate the actual attribute setters and getters. Under this approach accessing the field properties from the class would likely result in anAttributeError
as they would not be constructed until class instantiation.The second step is to invert this, and have the metaclass (or the underlying
Field
class) create these attributes at the class level, likely using descriptors (which theField
class could implement). Then, thefields
property on the class would only be present for backwards compatibility.Field ordering
The field ordering for RLP objects is important. Luckily, this is easy to do using metaclasses.
https://docs.python.org/3/reference/datamodel.html#metaclass-example
However, this gets complicated in cases of inheritance. From the discussion on this issue, we've decided that the best solution to fixing this is to required that either all fields or zero field be overridden when a class is sub classed.
Documentation
The documentation will need to be updated to reflect this new API.