KitApps / schema-refined

Clojure library to keep you away from bugs with precise schemas (refined types with runtime checks)
MIT License
67 stars 2 forks source link

[discussion] Define and prove invariants of the function #20

Open serzh opened 6 years ago

serzh commented 6 years ago

Basics

This is proposal and discussion on how to specify behavior of a function using refined types.

At first, let's consider an examples of functions and their behaviors you may want to specify:

  1. filter-by-type :: [TimelineItem] ItemType -> [TimelineItem], this function takes list of timeline items and filters them out by give type. The behavior you want to specify:
    • (B1) output is always a subset of the input list
    • (B2) all items in output must be only of given type
  2. create-profile: Email Name -> Profile, where Profile is {:email Email, :name Name, :id UUID}. This function creates a profile with given email and name. The behaviour to specify:
    • (B3) Profiles email and name should be equal to the provided one

It is clear that this behaviors are a key to the valid function specification. Without them type information doesn't capture the semantics of functions. You can write a function that will satisfy input and output types but won't behave correctly. The specifications of behaviors restrict possible implementation to more desirable.

If you try to formalize this behaviors it wil become clear that they are always functios of the input values:

But it's even more specific: given the output type ([TimelineItem] or Profile), behavior are function from the inputs to the refinements of the output type:

(github doesn't have support for LaTEX, putting an image here :( )

screen shot 2018-06-30 at 22 05 48

Relying on the statement above, we can model behavior as a functions from inputs to predicates (refinements) that will later be applied to the Output type:

;; B1
(defn is-subset [{:keys [items]}]
 #(set/subset? (set items) %)

;; B2
(defn has-given-type [{:keys [type]}]
 (r/ForAll #(= (:type %) type))

;; B3
(defn has-given-email [{:keys [email name]}]
 (r/And (r/On :email (s/eq email))
        (r/On :name (s/eq name))))

The nice property of behaviors being refinements is that they are perfectly composable. Say you want B1 and B2 to be described as single behavior:

(defn fileter-by-type-behavior [args]
 (s/And (is-subset args)
        (has-given-type args)))

How to do it in schema-refined

We need to create API for specifying behavior for a functions. We should definitely stick with plumatic/schema way of defining typed functions (like here), but add optinal behaviors to that syntax:

(r/defn filter-by-type :- [TimelineItem] | is-subset has-given-type
        [items :- [TimelineItem]
         type  :- (s/enum "post" "photo")]
 ...)

Let's discuss it

TODO

kachayev commented 6 years ago

@serzh "Specifying behavior" is misleading because you do not specify behavior, you're just describing some kind of invariants those must be held after execution is complete.

serzh commented 6 years ago

Yeah, this is why a have doubts about this term

kachayev commented 6 years ago

@serzh Just renamed. Sounds better?

serzh commented 6 years ago

@kachayev Looks nice

kachayev commented 6 years ago

What if we define Output type as the function on Input right away? E.g. we have :columns param in our Scroll API, and I want to make sure that each item in the response consists of only those columns... To define output schema I need to say

(defn Table [dt]
   {:totalCount s/Int
    :items [(SubsetOfKeys dt *)]})

where * can be computed only from the request. Doing this just by applying predicates afterward we are losing readability 😞as you're forced to say (r/NonStrictMap dt). More simple example might be

(defn left-pad [str len ch])

function, that should return (r/refined s/Str (r/BoundedSize len len)) which is "unknown" in advance. To ensure readability and serializability (so you can send them over the wire or render into documentation) of those types, we can use named vars (symbols) that will automatically define the invariant on their equality

(r/defn left-pad :- (r/refined s/Str (r/BoundedSize 'L))
  [str :- s/Str
   len :- s/Num 'L
   ch  :- s/Char])

A few pretty common examples on invariants check:

serzh commented 6 years ago

Good point. It looks like we can omit defining "base" for function return and instead we can return entirely refined type, I haven't found any point against it. And we can use same compositionality as I described above: just extract groups of refinement predicates to the functions.

We can use variables from the function argument list in the Output type expression:

(r/defn left-pad 
  :- (r/refined s/Str
       (r/And (r/StartsWithStr ch)
              (r/Bounded len)
              (r/EndsWithStr str)))
  [str :- s/Str
   len :- s/Num
   ch  :- s/Char])

And if it become too big, just move it to the function:

(defn LeftPadInv [str len ch]
 (r/refined s/Str
   (r/And (r/StartsWithStr ch)
          (r/Bounded len)
          (r/EndsWithStr str))))

(r/defn left-pad :- (LeftPadInv str len ch)
  [str :- s/Str
   len :- s/Num
   ch  :- s/Char])
kachayev commented 6 years ago

Yeah, that's what I meant. As for me, it looks way better.

kachayev commented 6 years ago

@serzh I think LeftPadInv is a kinda misleading name in your last example. What you defined is an actual type of the result, not the invariant.

kachayev commented 6 years ago

He's one another thing:

(r/defn reverse :- (r/refined [T] (r/BoundedSize (count v)))
  [v :- [T]])

^^^ we need to be able to capture var symbols both on parameters and their type signatures.

serzh commented 6 years ago

https://github.com/KitApps/schema-refined/issues/20#issuecomment-401632416 Where do you draw a line between type and invariant? For me they are pretty much the same: types is only one kind of invariants you could have.

serzh commented 6 years ago

https://github.com/KitApps/schema-refined/issues/20#issuecomment-401632779 Approach with capturing type variable will require us to create a "destructuring mechanism" for every predicate and type to capture those variables:

(r/defn conj :- (r/refined [T] (r/BoundedSize (inc N)))
  [xs :- (r/refined [T] (r/BoundedSize N))
    x :- s/Any])))

But I'm not sure it's a good idea to make it so flexible. I tried to think about only"domain specific" APIs, where you rarely have such generic functions. But we can try to implement it in a such way

kachayev commented 6 years ago

@serzh I think this is right the situation where solving more generic solution is actually simpler. Talking about invariants VS types... By the definition

Invariant is a property, held by a class of mathematical objects, which remains unchanged when transformations of a certain type are applied to the objects

when type is a set of values. Meaning (super simplified) when you have 2 types A and B and the transformation g : A -> B you can define something to be an invariant f for the transformation if (and only if)

∀ a ∈ A: f(a) = f(g(a))

Obviously, A -> B is a type by itself, but that doesn't help a lot with invariants.

gsnewmark commented 6 years ago

I have a stylistic question: why specify result type before the arguments? To be similar to schema and not confuse people experienced in Clojure, where everything after the params list is part of body? It looks a bit artificial to me as natural flow is from params to the result, and not vice versa.

serzh commented 6 years ago

@gsnewmark I've partially agree with you. I just don't want people get confused of where to place the parameter given schema and schema-refined.

serzh commented 6 years ago

@kachayev Got it. What I've really written is not an invariant but a dependent type.

Also, I've found that this (r/EndsWithStr str) part of LeftPadInv is actually invariant w.r.t the transformation.

So we need a way to destructure type variables in the function definition. I need some time to wrap my head around this.

kachayev commented 6 years ago

@serzh Talking about capturing type params/destructuring, we're returning back (once again) to the idea of generics declaration and probably we do not need to this right now. left-pad is a super simple example on that matter, as far as all of the type params for the result can be derived from function call params. conj from your examples can be also expressed as:

(r/defn conj :- (r/refined [T] (r/BoundedSize (inc (count xs))))
  [xs :- [T]
    x :- T])))

What we need to here, is to ensure T remains the same for xs and x... and that might be a problem 😞 To deal safely with that, we need either

a) super smart approach how to determine T in runtime for all possible situations (full type inference that still with fallback to Any)

b) the idea/notion of conj[T] as a function generic, which also seems a bit clumsy

Probably we need to "relax" here and cover situations like

(defn add-ticket :- (r/refined Settings
                               (r/On :tickets
                                     (r/BoundedSize (inc (count (:tickets settings))))))
  [settings :- Settings ticket :- Ticket])

and call it a day.

kachayev commented 6 years ago

@serzh BTW, my last example would look better with invariant function num-tickets-inc instead of this On + BoundedSize 😐

serzh commented 6 years ago

@kachayev

conj from your examples can be also expressed as:

conj clearly has to operate on the bounded list, in the other way it doesn't make sense. But there is no way to know if a collection is bounded in Clojure.

Probably we need to "relax" here and cover situations like

You mean skip type variables for now?

serzh commented 6 years ago

would look better with invariant function

is it an invariant? it is just a function that return type given a value.