Closed pcattori closed 4 years ago
MyClass.builder()
on all classes for which Builder is appropriate.Builder.build()
should be the one place where the logic for going from MyClass's API to the Tamr server's JSON API is encapsulated. That is, MyClassBuilder.build()
should produce the JSON that can be sent to the server.MyClassBuilder.put()
should be:
def put(self):
data = self.client.put(self.api_path, json=self.build()).successful().json()
return MyClass(self.client, data, self.api_path)
@charlottemoremen, @Nelson-Andrade, and I have two possible options:
The first is to have a BaseBuilder
class which handles overriding properties and has an abstract put
. Each resource would have a concrete builder with with_property
functions that created new instances of the builder class.
class X:
def __init__(self, client, data, alias):
self.client = client
self._data = data
self.api_path = alias
def a(self):
return self._data.get("a")
def build():
return XBuilder(self)
class BaseBuilder:
def __init__(self, contained, **kwargs):
self._contained = contained
self.__dict__.update(**kwargs)
def __getattr__(self, name):
return getattr(self._contained, name)
@abstractmethod
def _build(self):
pass
@abstractmethod
def put(self, resource_class):
new_data = self.client.put(self.api_path, json=self._build()).successful().json()
return resource_class(self.client, new_data, self.api_path)
class XBuilder(Builder):
def with_a(self, new_a):
return XBuilder(self, a=new_a)
def _build(self):
return {
"a": self.a
}
def put(self):
return super().put(X)
The second is to not have separate builders, but to put all of the above into each resource itself.
class X:
def __init__(self, client, data, alias):
self.client = client
self._data = data
self.api_path = alias
def a(self):
return self._data.get("a")
def with_a(self, new_a):
new_x = X(self.client, self._data, self.api_path)
new_x.__dict__ = dict(self.__dict__)
new_x.__dict__["a"] = new_a
return new_x
def _build(self):
return {
"a": self.a
}
def put(self):
new_data = self.client.put(self.api_path, json=self._build()).successful().json()
return X(self.client, new_data, self.api_path)
In the first example, put()
is very much a Template Method. The normal implementation of Template Method uses abstract methods to delegate; so, instead of passing resource_class
to the method, it would have a _resource_constructor(*args, **kwargs)
abstract method, which it would invoke at the appropriate point in execution.
build()
should also be part of the public interface; it's generally useful, e.g. to post()
on related collections.
So:
@abstractmethod
def build(self):
pass
def put(self):
new_data = self.client.put(self.api_path, json=self.build()).successful().json()
return self._resource_constructor(self.client, new_data, self.api_path)
@abstractmethod
def _resource_constructor(self, *args, **kwargs):
pass
class XBuilder(Builder):
def build(self):
return {
"a": self.a
}
def _resource_constructor(self, *args, **kwargs):
return X(*args, **kwargs)
In the second example, we've combined the class representing the resource, and the builder. This might be fine, but:
I would be remiss if I did not also call out the desire to favor composition over inheritance for the Template Method. Template combines well with Strategy, so that when an XBuilder
instance is constructed, it can plug in the implementation of _resource_constructor
rather than implementing a parent class' abstract method.
BUT, since XBuilder
wants a bunch of with_*
methods that are specific to X, it makes sense to have a Builder class that is specific to X. If we are subclassing Builder
anyway, then mixing composition with inheritance just complicates things, and we should stick with inheritance.
Motivating use cases:
geometry
attribute, I want to tag geo
. I also want to update the description of these datasets to say "Contains geospatial features."Builder-like pattern for creating resource specs.
Approach:
*Spec
classes for each resourcedeepcopy
_data
with_*(new_prop)
methods use {**x, 'key': value}
to create new specs with single modification
deepcopy
or other fancier machinery if you don't want this basic behavior^to_dict()
to get a dictionary representation of the specput()
to push this spec to the server as an update to existing resource (which we determine by the relative ID (or resource ID) in the specclass MyResourceSpec:
def __init__(self, data):
self.data = data
@staticmethod
def from_my_resource(my_resource):
return MyResourceSpec(deepcopy(my_resource._data))
def with_property_a(self, a):
return MyResourceSpec({**self.data, 'a': a})
def with_property_b(self, b):
return MyResourceSpec({**self.data, 'b': b})
def to_dict(self):
'''Return a copy of `self.data`'''
return deepcopy(self.data)
def put(self):
...
Additionally, we can add a spec()
method to MyResource
to initialize a spec:
class MyResource:
...
def spec(self):
return MyResourceSpec.from_my_resource(self)
Usage:
my_resource = MyResource(...)
my_resource.spec().with_property_a('🍔').put()
MyResource.spec()
method, do we need to the *Spec
classes to consume the public interface of a resource (i.e. via from_my_resource(...)
)? Or can we change the *Spec.__init__
method to accept the _data
prop directly?from_my_resource()
to from(...)
since its clear that the *Spec
class relates directly to its corresponding resource.From reviewing #262 , I have a couple observations:
client
and api_path
the same while only updating data
.dict
/JSON mapping for "property -> updated value", you have to deconstruct your mapping into 1-by-1 with_*
calls.*Spec.__init__
accept an instance of MyResource
, and change @staticmethod from_my_resource(...)
to of()
and add from_data(self, data)
. from_data
can take care of copying over client
and api_path
and only updating data
. Then each with_*
method can just call from_data
class MyResourceSpec:
def __init__(self, client, api_path, data):
self.client = client
self.api_path = api_path
self.data = deepcopy(data)
@staticmethod
def of(my_resource):
# note: MyResource.spec() needs to be updated to call `MyResourceSpec.from`
return MyResourceSpec(my_resource.client, my_resource.api_path, my_resource._data)
def from_data(self, data):
return MyResourceSpec(self.client, self.api_path, data)
def with_property_a(self, a):
return self.from_data({**self.data, 'a': a})
...
We can use solution (1) to deal with this easily:
r = MyResource(a=1, b=2, c=3)
r.a # -> 1
r.b # -> 2
r.c # -> 3
updates = { 'a': 'x', b: 'y' }
r.spec().from_data({**r._data, **updates}).put()
Since we don't want users touching _data
directly, we could offer a convenience method, update
:
r = MyResource(a=1, b=2, c=3)
r.a # -> 1
r.b # -> 2
r.c # -> 3
updates = { 'a': 'x', b: 'y' }
r.spec().update(updates).put()
I'm less thrilled with (2). Maybe we don't need the power of batched updates....
What @nbateshaus and I discussed about the contracts of with_*
functions:
If a property of a resource is a complex type (like an attribute type), it should have its own spec class, and a user should provide an instance of this spec to with_*
.
If a property is not a complex type (e.g. a string), it would not have a spec class, and the user would provide the simple representation directly.
If the property is a list/set of resources, the user would provide a list of these resources, represented as determined by the above.
Design finalized and implemented (see #279 for an example)
Superceded by #309
@nbateshaus , you previously state:
MyClassBuilder.build() should produce the JSON that can be sent to the server.
Should .build()
(or whatever we call it) be responsible for checking invariants such that it produced JSON that can be sent to the server that the server will be happy with?
Or do you think its better to let the server handle validation logic and let the server complain when the spec is invalid?
E.g. for an AttributeTypeSpec
what if you don't set an inner_type
, but you do pass in sub attributes
? Should we just construct the JSON and let the server tell us that its wrong? or should we try to catch the invalid JSON before sending it to the server?
New proposal: create
and update
functions should specify requirements for each field (without needing a spec class).
E.g. for attributes:
# Attribute.py
def create(
session: Session,
dataset: Dataset,
*,
name: str,
type: AttributeType.AttributeType,
is_nullable: bool,
description: Optional[str] = None,
) -> Attribute:
...
def update(
session: Session,
attribute: Attribute,
*,
# TODO can is_nullable be updated?
is_nullable: Optional[bool] = None,
description: Optional[str] = None,
) -> Attribute:
...
Then users can call like:
attr = Attribute.create(
tamr.session,
name='address',
type=...,
is_nullable=False,
description='US-based address'
)
...
attr = Attribute.update(
tamr.session,
attr,
description='US-based address with postal code'
)
Benefits:
Going forward with proposal above^^^.
Feel free to reopen this issue for further discussion if needed.
💬 RFC
We have PRs that implement the builder pattern for some resources (#237 , #249 , #250 ). We should decide on overall design for how we want to apply the builder pattern in our codebase.
E.g.
MyClass.builder()
(which returns aMyClassBuilder
instance)?.put()
) OR should it use domain-specific verbs (e.g.create
,upsert
, etc...).build()
and then any communication with the server is done via client? (e.g.mc = MyClass.builder().with_option(param).build(); client.put(mc)
🔦 Context
💻 Examples