NixOS / nix

Nix, the purely functional package manager
https://nixos.org/
GNU Lesser General Public License v2.1
12.44k stars 1.5k forks source link

Runtime type checking support #8211

Open roberth opened 1 year ago

roberth commented 1 year ago

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Introductory example

I don't care much about the syntax specifics, but involving @ lets us not steal used syntax. Don't let that distract you.

# single parameter lambda with strict check
# this checks `a` before evaluating the body
a @! builtins.isString:
  "hello " + a;

This is equivalent to

f' = a:
  assert builtins.isString a; "hello " + a;

Not much gained, I would agree. The real value is in lazy checks.

Lazy checks, how and why

# single parameter lambda with lazy check
# this function returns `{ sayHi = <thunk>; }` before `a` is checked
g = a @? builtins.isString :
    { sayHi = "hello " + a; };

The equivalent would be

g = a:
  let a_ = assert isString a; a;
  in let a = a_; in
  { sayHi = "hello " + a; };

While arguably that expression could be simplified, it illustrates the two points of friction:

The benefit of lazy checks comes from the improvement to error messages. While strict checks come with a pretty good location, having the root cause close to one trace item earlier, a lack of checks, which is effectively required for lazy check today, culminate deep in some library code, far away from the root cause.

A less obvious benefit also comes from this depth. Usually an argument is propagated a few times before it is used, which means that potentially many functions exist in between, which all have a reason to add a type check. Implemented naively, similar to the code above, this creates a chain of thunks that adds basically no value. This can be prevented by introducing a new type of thunk, tLazyCheck that references the thunk to check, the checking function, and perhaps the location of the check for error reporting (fits after #8209). However, before constructing a tLazyCheck, we can perform a pointer comparison on the checking function and avoid allocating a thunk if it's already going to be checked by that function. (Doesn't work for alternating check functions, but that seems like it would be rare.) This almost completely removes the memory overhead of lazy checks.

A useful, coherent syntax

Now that we have a syntax for strictness and laziness, we can generalize it to lambdas with formals. This seems like a natural extension based on the preceding.

# this function returns `{ sayHi = <thunk>; }` before `a` is checked
{ a @? builtins.isString, ... }:
  { sayHi = "hello " + a; };

Now that we have a symbol for laziness, we might even extend the language a bit to support lazy formals.

# Return { result = <thunk> } before checking isAttrs a
{ ... }?:
{
  result = a;
}

Module system style checks

The module system performs a non-obvious kind of runtime type checking: it checks everything, but only one Value at a time. (Usually and ideally)

This means that it is non-strict in the sense of evaluation, but strict in the sense of checking everything; almost a paradox when presented this way.

The module system is good at type checking, so this behavior has surprised people when they try to use it as a strict (in the sense of evaluation) type checker.

The @? type checking operator does not support this. Instead, or in addition, we could support this style of type checking. It relies on returning a new value that represents the checked value. @= might be a good symbol, as it effectively creates a new binding. With it, structures can be checked deeply without loss of laziness, and lazy checks can be combined. This format also allows parameters to be altered. A non-idempotent check may undo referential transparency if don't remove the optimization of repeated checks.

{ a @= checks.attrsOf checks.str }:
{
  b = a.b;
}

We can guarantee that a.b (and therefore b) is a string or exception. This is also the case if we used a.b through some other functions, but best of all, the exception will originate from the formal above, instead of whatever code triggered the check.

We might remove @? in favor of @= to simplify things.

Syntax details

As mentioned, the syntax I've used can perhaps be improved.

? already has a meaning: default value. Use a different symbol like ~, perhaps? I don't think that steals from path syntax. ~ and ! have an established, compatible meaning in Haskell.

This proposal only extends the function syntax. Operator priorities shouldn't be an issue afaict. I don't think I've proposed syntax that's already part of a possible valid expression.

We might also consider reusing a keyword. Perhaps in instead of @?, let instead of @=, !in instead of @!. This gets a bit ambiguous because of negation though. It also doesn't help much towards a syntax for lazy formals.

Describe alternatives you've considered

Not an alternative but an addition: a half-strict check, that checks eagerly when the value isn't still a thunk. This adds too little value in my opinion, and does not support check deduplication on those non-thunk values, which could lead to surprising performance problems if the check is expensive.

Remove @? in favor of @=.

Additional context

@zimbatm mentioned such an idea to me quite some time ago, that we might store a type with thunks. I think this elaboration shows the feasibility and merit of it.

@hsjobeki has done some research into syntax for type signatures, motivated by hoogle.dev's need for standardization and side benefits for other tooling. He does not propose an addition or change to the language syntax, as that would be mostly counterproductive. Runtime type checking is inherently part of the language though, so improvements to it are library improvements or changes to the syntax.

Priorities

Add :+1: to issues you find important.

L-as commented 1 year ago

Two thoughts: