datajoint / datajoint-python

Relational data pipelines for the science lab
https://datajoint.com/docs
GNU Lesser General Public License v2.1
170 stars 85 forks source link

AutoPopulate as an abstract subclass of Base #37

Closed eywalker closed 9 years ago

eywalker commented 9 years ago

Implementation of features for AutoPopulate depends on the implementation of the Base. In particular, a class that derives from AutoPopulate (which is an abstract class now), will function only if:

Since we really don't expect the subclass to not be a Base derivative for the AutoPopulate subclass to function, I think it'd make more sense to simply make AutoPopulate itself an abstract subclass of the Base class. This way, when one wants to implement a table with populate functionality, the class has to only inherit from AutoPopulate.

fabiansinz commented 9 years ago

Seems reasonable to me. Am 02.05.2015 21:12 schrieb "Edgar Y. Walker" notifications@github.com:Implementation of features for AutoPopulate depends on the implementation of the Base. In particular, a class that derives from AutoPopulate (which is an abstract class now), will function only if:

it implements all abstract propoerties (popRel) and abstract method (makeTuples) - these two points are fine, although I'd like to rename them to fit Python naming convention betterthe subclass must actually be a Base derivative to function correctly - basically we expect the subclass to inherit from Base and AutoPopulate simultaneously. This I think is conceptually messy.

Since we really don't expect the subclass to not be a Base derivative for the AutoPopulate subclass to function, I think it'd make more sense to simply make AutoPopulate itself an abstract subclass of the Base class. This way, when one wants to implement a table with populate functionality, the class has to only inherit from AutoPopulate.

—Reply to this email directly or view it on GitHub.

dimitri-yatsenko commented 9 years ago

AutoPopulate is a mixin class. It's not particularly messy conceptually. http://stackoverflow.com/questions/533631/what-is-a-mixin-and-why-are-they-useful

I don't know if we win much by making it inherit from Base.

fabiansinz commented 9 years ago

If what Edgar says is true, that inheriting from AutoPopulate assumes that we also inherit from Base, I think it would conceptually be cleaner to already see the "mix", i.e. Base plus the features from AutoPopulate. Users could still inherit from Base only. I think mixing it in like it is at the moment would only make sense if the mixed in is independent.

On 05/02/2015 10:47 PM, Dimitri Yatsenko wrote:

AutoPopulate is a mixin class. It's not particularly messy conceptually. http://stackoverflow.com/questions/533631/what-is-a-mixin-and-why-are-they-useful

I don't know if we win much by making it inherit from Base.

— Reply to this email directly or view it on GitHub https://github.com/datajoint/datajoint-python/issues/37#issuecomment-98432796.

Dr. Fabian Sinz contact info: http://goo.gl/o87zXC

dimitri-yatsenko commented 9 years ago

Mixin classes are always multiply inherited along with some other class. They are never independent.

I like the syntax

class MyTable(dj.Base, dj.AutoPopulate):

because it communicates that it a base relation that also has autopopulate functionality.

dimitri-yatsenko commented 9 years ago

For now I am inclined to keep things as they are. It's also how we have it on the MATLAB side.

fabiansinz commented 9 years ago

I would agree if the mixing part from AutoPopulate were independent from Base. However, according to Edgar, it is not. Then I find it clearer to have a hierarchy. For example, if you look at MutableMapping in the abstract base classes from collections (https://docs.python.org/3/library/collections.abc.html), it inherits from Mapping, because it needs the functionality that is already provided by Mapping. I think we are kind of in the same situation here.

What is your reason why you would like it to be mixin instead of hierarchical?

On 05/02/2015 10:57 PM, Dimitri Yatsenko wrote:

Yes, mixin classes are always multiply inherited along with some other class.

I like the syntax

class MyTable(dj.Base, dj.AutoPopulate):

because it communicates that it a base relation that also has autopopulate functionality.

— Reply to this email directly or view it on GitHub https://github.com/datajoint/datajoint-python/issues/37#issuecomment-98434337.

Dr. Fabian Sinz contact info: http://goo.gl/o87zXC

fabiansinz commented 9 years ago

You and Edgar wrote this thing, so I'll go along with whatever you decide to do. We could discuss it on Monday.

On 05/02/2015 11:04 PM, Dimitri Yatsenko wrote:

For now I am inclined to keep things as they are. It's also how we have it on the MATLAB side.

— Reply to this email directly or view it on GitHub https://github.com/datajoint/datajoint-python/issues/37#issuecomment-98435643.

Dr. Fabian Sinz contact info: http://goo.gl/o87zXC

dimitri-yatsenko commented 9 years ago

Mixins are never independent. They confer new functionality to classes they are mixed with. Yes, it happens to be mixed with Base but I disagree that it depends on Base that much. The only Base-speicific thing is the conn property. I think the mixin solution communicates its functionality clearer.

Yes, let's discuss on Monday.

eywalker commented 9 years ago

Wow I seemed to have missed out on some interesting discussion!

The thing that bugged (and thus caused me to create this issue) was the fact that the natural implementation of populate method in the AutoPopulate class takes the popRel keys and antijoins the self, thus requiring the self (the concrete subclass of the AutoPopulate) to be a derivative of _Relational. At least this is how it is done in Matlab. If we can drop this assumption about the AutoPopulate derivative, then I agree that keeping it as mixins makes sense, and it can be used meaningfully as a way of invoking a function (makeTuples) for each key in the popRel, sort of like a visitor paradigm.

We could alternatively expect the derived class to implement another property like target which is a _Relational that gets anti-joined from the popRel.

dimitri-yatsenko commented 9 years ago

yes, AutoPopulate must be mixed with a subclass of Relational. But having it as a mixin allows for greater flexibility. For example, the user might want to define a manual subclass for table definition only and then derive a separate subclass for autopopulatin. I prefer to keep it as is because I don't see a sufficient advantage for changing it.

eywalker commented 9 years ago

I know this is a rather trivial point and I'm not going to drag this on any further, but just wanted to write out the approach I'll take. I would think that having AutoPopulate as a mixin really makes sense if it can be treated like a (relatively) independent addon features where other classes could benefit from. Given that AutoPopulate will contain the logic for:

  1. Visiting each keys in the relation described by popRel
  2. Execute the content of the function makeTuple with each key
  3. Distribution of this visiting scheme into potentially parallel, asynchronous set of processes (via use of Job reservation tables)

I can see that this logic is already useful if you want to implement a logic to visit and execute a function on each key of a relation and potentially do so in a distributed fashion. To support this I'm going to add target property to AutoPopulate with a default implementation that returns self. Within populate method, the target will be anti-joined on the popRel to serve as a way of figuring out which keys among popRel still needs to be visited. Using self will makes sense if the derived class itself is the target of the populating action, which is pretty much the standard scenario. However, a non _Relational class could use the AutoPopulate mixin simply by overriding the target property. Doing so will grant the derived class with a nice logic to visit and execute function on each key of the popRel - target.

Once implemented, this non-standard use case for AutoPopulate should be documented somewhere...