Internal vs external validation

thejohnfreeman commented 12 years ago

Gabe has written an excellent report comparing the two main approaches to validation: with special options for binders (external), and with binding variables that extend a view-model (internal). I uploaded the report to the project website so that we could discuss it here.

Issue #24 relates to this discussion in part, but I wanted to start a new thread so that that issue can stay focused on multi-view validation specifically.

gfoust commented 12 years ago

Here's an idea that's been bouncing around in my head:

One of the concerns with internal validation is that, by creating an extra variable that holds the value of the view, we are duplicating data unnecessarily—we have to create a copy of what's in the view to put in the binding variable. In the report I suggested one possible solution to the duplication: that the binding variable would not actually hold a value itself, but would be a special kind of proxy-variable, using the view to hold the value and the read/write operations provided by the binder to access the value.

So the new idea is that the binding variable does hold a value, but that the value it holds is actually the view itself. Then the read and write operations would be used (along with any conversion/validation functions) to create a constraint between the binding variable and target variable. So in one direction we would have a method which takes the view as input and (using the read operation) produces a value for the target variable as output. In the other direction is a method which takes the target variable and “produces” a view—but, of course, it would be the same view, just (using the write operation) updated with the new value. The event handlers for widget-change events would simply touch the binding variable.

I'm not sure if this is actually any better than the proxy-variable approach. Both avoid the unnecessary duplication problem. I guess the advantages of this approach are:

Binding variables are just ordinary variables, not special proxy-variables.
There is no need to create an intermediate value to represent what's in the view. For example, suppose the target variable held an object, but that the view used only three of the fields of that object, allowing the user to edit them. With the other approach we would need something to be the value of that view (e.g. a dummy object with the three fields) and then some way of updating the target variable with the intermediate value. With this approach we just update the target variable using the view (by means of the read operation).

I suppose the advantages of the proxy-variable approach is:

The fact that the view is part of the model is encapsulated
The proxy-variable holds a value that's can be used by other methods.

Of course, both of these approaches create a scenario where a model containing binding variables cannot be evaluated until binding is completed, which can be viewed as a disadvantage. And I'm not sure we really have a use-scenario yet which justifies either of these approaches.

P.S. Sorry—didn't mean to close the issue; just clicked the wrong button.

thejohnfreeman commented 12 years ago

One quick note before my laptop dies: I keep seeing talk of "temporary object" to overwrite a method output. One thing I plan to support is direct manipulation of outputs in methods instead of returning replacements. This goes with the desire for rich mutations beyond assignment.

jaakkojarvi commented 12 years ago

With direct manipulation of outputs in methods, do you mean that a method is a procedure that reads the old value of its output(s), and can modify it (e.g. calling a method instead of assigning to it?)

E.g.: Assume two vars A, B and a method m from A to B. The "old" model is that m: A -> B, the "new" model will be m: (A, B&) -> ()

?

thejohnfreeman commented 12 years ago

Going through the report:

Validation Function

I don't think it's necessary to support "an arbitrary sequence of either-value-or-error/maybe-error functions". The programmer could perform the composition manually (and will have to anyway for more complicated combinations of converters/validators).

Error Reporting

I am not a fan of the "implicit error as a property of the target variable" default behavior. In other cases where we provide defaults, it is generally the most commonly seen explicit specification. In this case, however, it would never be seen as an explicit specification since programmers generally should not set properties on proxies (for fear of overwriting something needed by HotDrink). I think we should require explicit specification for every case that needs to bind to an error.

Effects on Space Complexity

I like the approach in Gabe's comment above where the variable's value is actually the view. It mimics closely our existing approach.

However, the trouble with evaluation-before-binding is a problem that needs to be solved before we can implement internal validation. Perhaps wrap the binding variable creation in a constructor that gets passed to hd.bind (building off what is proposed in the Modular Design section)?

Partial-Value Binding

Given the example, I think this should be called "Property Binding".

If the view wants to manipulate a single property without changing the whole object, how can observers discover that the object has changed? Either the property needs to be an hd.variable, or we need a new proxy, e.g., hd.obj, that supports the rich mutation "property assignment".

In-Model Validation

I don't think this should be a goal for internal validation. A different binding variable would have to be created for every method that outputs the variable, and it could create a cycle in the case of self-loops. Besides, validating view bindings is enough work. :)

thejohnfreeman commented 12 years ago

Direct mutation (I should not have said "manipulation") of outputs is just a syntactic change that enables some optimization by better reflecting the semantic distinctions among mutations.

Taking your example of a method m with input A and output B, consider the case where m assigns the property foo on B to the value A. One might first imagine the most straightforward, simplest implementation:

.method(B, function () {
  var b = B();
  b.foo = A();
  return b;
});

However, since this method returns an "identical" value for its output[^1], the evaluator will assume it is unchanged, and the method's mutation will go unobserved by every other method and subscriber.

[^1]: Remember that equality comparison for objects in JavaScript is shallow.

To make the mutation observable, we must choose an unintuitive, inefficient method implementation that constructs and returns a copy of B:

.method(B, function () {
  var copy = {};
  Object.extend(copy, B);
  copy.foo = A();
  return copy;
})

Direct mutation is intended to change this. Assuming B is using the new hd.obj proxy I mentioned above, we can use an implementation where the mutation is both efficient and observable:

.method(B, function () {
  B.prop("foo", A());
});

gfoust commented 12 years ago

Thanks for the feedback! Here's my thoughts

Validation Function

I don't think it's necessary to support "an arbitrary sequence of either-value-or-error/maybe-error functions".

I agree it's not necessary; here's a couple of ideas as to why it might be helpful.

Allows modular composition of models. You could take a pre-written component which already had a validation function and add a second validation function. Or you could take two components, each of which had their own validation function, and unify them.
Provides straightforward implementation of knockout-style validation construction. So you could write something like this: this.var.bind().required().min(1).max(100); Each of these constructors just creates a new function and adds it to the list of validation functions.

Error Reporting

In this case, however, it would never be seen as an explicit specification since programmers generally should not set properties on proxies (for fear of overwriting something needed by HotDrink).

Yeah, that's a good point. I still think, though, that having to create/specify every error variable manually is going to get old really quickly.

Maybe we need to spend some time thinking about namespaces. If we're going to support modular composition of models then we're going to need some way (even if it's just a convention) of organizing and referring to those different variables. Maybe if we had a scheme for that then maybe we could apply it to creating good names for automatically-created error variables.

Effects on Space Complexity

However, the trouble with evaluation-before-binding is a problem that needs to be solved before we can implement internal validation.

Here's a thought: A binding constraint (created when you create a binding variable) would consist of special methods which have properties: a read (or write) function and possibly some validation functions. The body of the method would take the value of the binding variable, use its read function on it to get a value, use the validate function to test the value, then return that. So by default, the read function could just be the identity function, meaning the value of the target variable is stored directly in the binding variable. Later on when binding occurs we can replace the value of the binding variable with the view, and replace the read function with the actual read operation provided by the binder. We can also use the previous value of the binding variable to initialize the view (via the write operation).

Partial-Value Binding

If the view wants to manipulate a single property without changing the whole object, how can observers discover that the object has changed?

This seems very closely related to the other comment thread going on in this issue related to direct mutation of output variables. So it might be worth noting here that any solution we come up with for methods being able to perform mutations on the output will also work with internal binding, since internal binding is just a method.

In-Model Validation

I don't think this should be a goal for internal validation.

My motivation for this comes from my comment in #24, where we want to validate number of nights, but we still want to use a constraint to calculate number of nights. I think the best way to support that is with in-model validation. The other alternatives are 1) don't use the constraint, 2) have the validator duplicate the work of the constraint, 3) use a pre-condition instead of a validator, in which case the bad value gets through.

thejohnfreeman commented 12 years ago

I agree it's not necessary; here's a couple of ideas as to why it might be helpful.

Okay, those are good motivations.

I still think, though, that having to create/specify every error variable manually is going to get old really quickly.

I think we should wait for common uses to arise before attempting to provide good conveniences. An obviously good design may emerge once we have something to look at.

Maybe we need to spend some time thinking about namespaces.

Do we have this already by packaging models into constructors?

This seems very closely related to the other comment thread going on in this issue related to direct mutation of output variables.

Correct.

Here's a thought: A binding constraint ...

I don't think I completely understand this. It might be a discussion best done in person with a whiteboard.

My motivation for this comes from my comment in #24, where we want to validate number of nights, but we still want to use a constraint to calculate number of nights.

I understand, and I still don't like the approach. I want to avoid creating new variables that get injected into the graph the programmer built (read: change their method's inputs or outputs). Right now, we are only discussing new variables that wrap/extend/sit outside of the graph the programmer built.

I left a comment in #24 as well, giving my preferred solution (with reasons): guard the method. Execute the method only when the guards pass. With the direct mutation of outputs feature, when the method fails to execute, the evaluator will know the outputs were unchanged and won't execute the downstream methods or notify subscribers.

thejohnfreeman commented 12 years ago

Just to clarify Gabe's idea, which I think is very clever:

However, the trouble with evaluation-before-binding is a problem that needs to be solved before we can implement internal validation.

A binding variable is created at the same time as the target variable. Since this occurs before binding, it starts with a non-view value (presumably the current value of the target variable). Also created is a binding constraint of two methods connecting the binding variable to the target variable.

In the direction of binding variable to target variable, the method has a few properties:

read :: View -> ViewValue
convert :: ViewValue -> Either Error ModelValue
validate :: ModelValue -> Maybe Error

These properties are used by the method each time it executes. If the system evaluates before binding occurs, then the View type will actually be ViewValue and we can just use the identity function for read. Successful execution of the method changes the target variable.

The method for the opposite direction, from target variable to binding variable, is similar, e.g., with write instead of read.

Errors returned by either method are stored on the binding variable to be read by error binders.

When binding occurs, we use the value of the binding variable to initialize the view, then we replace the binding variable's value with the view. That means the View type has changed to some abstract type, and we must also replace read to whatever is appropriate for the view.

HotDrink / hotdrink