Closed dimitri-yatsenko closed 9 years ago
I think probably the cleanest way to do this is to override insert
in autopopulated classes and call insert
from the superclass in populate
. insert
itself could throw and error that it doesn't like to be called. Autopopulate
would define an insert
that just throws an error and populate
would call the insert
from Relation
.
Problem with this is that, overriding of insert
has to happen somewhere in the AutoPopulate
mixin. However, if the deriving class uses the signature as shown:
class MyTable(dj.Relation, dj.AutoPopulate):
Then the insert
method derived from dj.Relation
(which of course is getting its insert
method from the dj.FreeRelation
) will mask whatever the insert
method we would implement in dj.AutoPopulate
, completing missing the point. In order for the insert
in dj.AutoPopulate
to take precedence, the inheritance signature has to be:
class MyTable(dj.AutoPopulate, dj.Relation):
I don't like the fact inheritance order will be so critical here. Also I'm not sure if I like the idea of making the mixin class dj.AutoPopulate
alter the behavior of the class it got mixed with (in this case the insert
method of dj.Relation
).
If we really want auto-populated relation to not have public insert
method, I think the cleanest way to do this is to make AutoPopulate
a subclass of Relation
and ask user to inherit from that instead (perhaps rename the class to AutoRelation
or something, in that case).
I personally think asking the user to remember to inherit AutoRelation
is not asking too much more than to remember to inherit from both AutoPopulate
and Relation
.
Also you have to remember the fact that insert
has to be called within make_tuples
which is ultimately written by the user. So to ask them to explicitly call superclass's insert
(via super().insert
) is weird to me. This problem will not be resolved even if we let AutoPopulate
(or AutoRelation
or w/e) inherit from Relation
instead.
Ok, then let's maybe take a step back.
I think prohibiting the use of a function that already exists is hard to do in Python without creating a mess (despite the fact that it is pretty unpythonic). Therefore, the function insert
should not be there if it is not to be called.
What about the following solution: We create classes ManualRelation
and Autorelation
that both inherit from relation. Relation.insert
is turned private into Relation._insert
. ManualRelation
wraps _insert
with insert
. make_tuples
in AutoRelation
actually returns data (as the name suggests anyway) and AutoRelation.populate
calls _insert
from populate
.
The burden on the user is merely:
AutoRelation
objects. If (s)he does, it's not our business because we declared it private. What if insert has a check inside whether it's in an autopopulate class and show a warning if it's used outside a populate class? That's what I had in mind. Just a protected internal _insert_caution_flag
.
I think that is somehow messy.
We cannot simply make insert
check whether it is in an instance of AutoRelation
and deny entering data if it is, because we do need insert
in makeTuples
(I still think it should be in populate
).
I am not a big fan of having flags, but even if we do introduce a _insert_caution_flag
, this needs to exist in Relation
in order to be checked by ìnsert
just for the case that someone mixes in AutoRelation
. My point is, we need a class member that exists in all classes but it is only meaningful in some classes. There ought to be a better solution than that.
I like @fabiansinz's solution of basically removing the need for the user to call insert
from within the make_tuples
. After all, I think it is more natural for the user to have to only worry about taking care of making (and returning) an appropriate tuple(s) corresponding to a key, but not to have to worry about actually inserting this into table - this should be handled by populate
which calls the make_tuples
anyway.
This change is rather big deviation from current Matlab implementation and we will have to carefully step through the logic (e.g. how computed subtables can be handled). The nice thing about this implementation is that we can completely remove need for call of insert
by user when using AutoRelation
- which I would say makes sense be a subclass to Relation
. Within AutoRelation
, it will simply override the insert
method making it unavailable to the user (internally the populate
method would call the _insert
).
I think we should keep it simple and make regular Relation
with a functional insert
(which just calls _insert
) method and AutoRelation
with populate
method but no insert
, and let the user decide on which to use. I'm not sure how tightly bound we should make the choice of Relation
to the type of table.
I like the solution by @eywalker. It's clearer and easier to handle. Plus: There having the tiers in the definition is still justified.
Do you really think it would be a big problem to move insert
to populate
? After all, if I know how to insert stuff, I should know how to generate it. So I don't think that it will become more difficult for the user. Or do you think that having insert
in populate
will cause trouble?
I personally don't think moving the insert
into the populate
should be difficult at all, but I'm just concerned about the popular use case found in Matlab where makeTuples
invokes additional makeTuples
of "computed subtables",within which the subtable inserts tuples for itself. I just haven't had enough time to think whether and if so, how such mechanism will be affected by moving insert
into populate
. I think it will be worth revisiting this idea of computed tables and computed subtables, and see if their relationship can be cleaned up.
_make_tuples
could become a generator that yields tuples and populate
insert them. To deal with subtables or supertables, we could call their populates with the specific key. Then populate need to understand that it's inside a transaction and not start a new one.
How about this: if the object has a method called _make_tuples
, it should display a warning messages when insert
is called? But insert is still inherited just in case.
I like the sound of the solution - will have to think about exactly how we should implement it.
@dimitri-yatsenko : As for the suggestion about insert
, you are suggesting this check because you still think AutoPopulate
should be a mixin? If we let there be AutoRelation
, then it is very easy to either make
insert
unavailable in the sense we hide access to _insert
insert
to remind users that they ought to be using populate
If we are going to stick with AutoPopulate
mixin, then Relation
's insert
method has to do the weird check of seeing if the derived class implements _make_tuples
and give warning accordingly. Anytime a class method has to be implemented with an anticipation of a behavior in its derivative, I feel that something is not right. Please disregard this part if I totally misread your suggestion.
I also think that the inheritance solution is better. You will have to explain the problem with super- and sub-tables to me. At the moment, I don't think I am getting it.
I cannot say that I fully grasp what's needed to deal with super- and sub-tables as their definitions and how they ought to fit to the entire DataJoint concepts has been unclear to me as well. It will be great if we could talk about this on Monday to clarify, so we can rapidly implement them.
If you are going with the AutoRelation
approach, it would be nice if you could still offer the "visit each pop_real - target" mentioned in #37 as a separate mixin just under a different name.
I agree that such a visitor-like mixin will be useful. Perhaps AutoRelation
can inherit from Relation
and such visitor mixin - that way I don't have to write similar code twice :)
On May 9, 2015 5:43 PM, "Florian Franzen" notifications@github.com wrote:
If you are going with the AutoRelation approach, it would be nice if you could still offer the "visit each pop_real - target" mentioned in #37 https://github.com/datajoint/datajoint-python/issues/37 as a separate mixin just under a different name.
— Reply to this email directly or view it on GitHub https://github.com/datajoint/datajoint-python/issues/60#issuecomment-100558074 .
Ok, let's discuss in on Monday and implement it rapidly.
Again, I didn't follow the thread in detail, but I don't think the suggestion of preventing calls to insert() makes sense. This kind of thing is solved by documentation and specification. We can never prevent users from abusing the code. If we try to get smart like that, users will get smarter and come up with even crazier solutions.
okay, sorry for the confusion. Yes, let's keep the public insert
in automatic tables.
automatically populated tables should never be inserted into from outside a populate call. Calling
populate
should enableinsert
. Exiting populate should prohibitinsert
.