fulcrologic / guardrails

Efficient, hassle-free function call validation with a concise inline syntax for clojure.spec and Malli
Eclipse Public License 2.0
240 stars 26 forks source link

:sectanchors: ifdef::env-github,env-cljdoc[] :tip-caption: :bulb: :note-caption: :information_source: :important-caption: :heavy_exclamation_mark: :caution-caption: :fire: :warning-caption: :warning: endif::[]

= Guardrails

image:https://img.shields.io/clojars/v/com.fulcrologic/guardrails.svg[link=https://clojars.org/com.fulcrologic/guardrails] image:https://circleci.com/gh/fulcrologic/guardrails/tree/main.svg?style=svg["CircleCI", link="https://circleci.com/gh/fulcrologic/guardrails/tree/main"]

Efficient, hassle-free function call validation with a concise inline syntax for clojure.spec and Malli:

[source, clojure]

(>defn ranged-rand [start end] [int? int? | #(< start end) ; | = such that => int? | #(>= % start) #(< % end)] (+ start (long (rand (- end start)))))

Guardrails is intended to make it possible (and easy) to use specs/schemas as a loose but informative (even advisory) type system during development so you can better see where you are making mistakes as you make them, with zero impact on production build size or performance, and minimal impact during development.

It's the evolution of https://github.com/gnl/ghostwheel[Ghostwheel's] runtime spec validation functionality, which it streamlines, optimizes and extends, while keeping the original inline syntax. See the <<Why?, "Why?">> section for more details on Guardrails' rationale and origin story.

== NEW in 1.2

Version 1.2 adds some exciting new features and performance improvements (partially funded by https://www.dataico.com[Dataico]):

== Quick Start

. Add this library to your dependencies . Create a guardrails.edn with {} in it in your project root . When you run a REPL or CLJS compiler, include the JVM option -Dguardrails.enabled ** Optionally: If you're using CLJS, set your compiler options to include {:external-config {:guardrails {}}}

And code as follows:

[source, clojure]

(ns com.domain.app-ns (:require [com.fulcrologic.guardrails.core :refer [>defn >def | ? =>]]))

;; >def (and >fdef) can be used to remove specs from production builds. Use them to define ;; specs that you only need in development. See the docstring of ;; com.fulcrologic.guardrails.noop for details. (>def ::thing (s/or :i int? :s string?))

;; When guardrails is disabled this will just be a normal defn, and no fspec overhead will ;; appear in cljs builds. When enabled it will check the inputs/outputs and always log ;; an error using expound, and then optionally throw an exception, (>defn f [i] [::thing => int?] (if (string? i) 0 (inc i)))

When the function is misused you'll get an error:

[source, bash]

user=> (f 3.2) ERROR /Users/user/project/src/com/domain/app_ns.clj:12 f's argument list -- Spec failed --------------------

[3.2] ^^^

should satisfy

int?

or

string?

-- Relevant specs -------

:user/thing: (clojure.spec.alpha/or :i clojure.core/int? :s clojure.core/string?)

You can control if spec failures are advisory or fatal by editing guardrails.edn and setting the :throw? option. See <> for more details.

Make sure to set your editor or IDE to resolve Guardrails' >defn and >fdef as Clojure's defn, and >defn- as defn- respectively – this way you get proper highlighting, formatting, error handling, structural navigation, symbol resolution, and refactoring support. A linting configuration for clj-kondo is included and should work out of the box.

=== Output Options

The configuration can be changed to help the signal to noise ratio on failures. If you have heavily instrumented your code with Guardrails, then turning on these two config options will likely give you a clearer picture on failures:

Try the above settings as shown. You can always see the full stack by calling last-failure (which is printed with any failures for convenience). The result is way less noisy, and often sufficient to find the problem. When it's not, the full stack trace is just a copy/paste away.

== Clojurescript Considerations

I use shadow-cljs as the build tool for all of my projects, and highly recommend it. Version 0.0.11 of Guardrails checks the compiler optimizations and refuses to output guardrails checks except in development mode (no optimizations). This prevents you from accidentally releasing a CLJS project with big runtime performance penalties due to spec checking at every function call.

The recommended approach for using guardrails in your project is to make a separate :dev and :release section of your shadow-cljs config, like so:

[source, clojure]

{:builds {:main {:target :browser ... :dev {:compiler-options {:closure-defines {'goog.DEBUG true} :external-config {:guardrails {}}}} :release {}}} ...}

Doing so will prevent you from accidentally generating a release build with guardrails enabled in case you had a shadow-cljs server running in dev mode (which would cache that guardrails was enabled) and built a release target:

[source, bash]

in one terminal:

$ shadow-cljs server

later, in a different terminal

$ shadow-cljs release main

In this scenario Guardrails will detect that you have accidentally enabled it on a production build and will throw an exception. The only way to get guardrails to build into a CLJS release build is to explicitly set the JVM property "guardrails.enabled" to "production" (NOTE: any truthy value will enable it in CLJ).

You can set JVM options in shadow-cljs using the :jvm-opts key:

[source, clojure]

:jvm-opts ["-Dguardrails.enabled=production"]

but this is highly discouraged.

=== Dead Code Elimination

There is a a noop namespace that can be used in your build settings to attempt to eliminate all traces of guardrails and dependent code. This will not remove spec dependencies unless you only use spec for guardrails, so do similar tricks for your inclusions of spec namespaces.

See https://github.com/fulcrologic/guardrails/blob/develop/src/main/com/fulcrologic/guardrails/noop.cljc[noop.cljc].

[[gspec-syntax]] == The Gspec Syntax

[arg-specs* (| arg-preds+)? \=> ret-spec (| ret-preds+)? (\<- generator-fn)?]

| : such that

The number of arg-specs must match the number of function arguments, including a possible variadic argument – Guardrails will shout at you if it doesn't.

=== Single/Multiple Arities

Write the function as normal, and put a gspec after the argument list:

[source, clojure]

(>defn myf ([x] [int? => number?] ...) ([x y] [int? int? => int?] ...))

=== Variadic Argument Lists

arg-specs for variadic arguments are defined as one would expect from standard fspec:

[source, clojure]

(>fdef clojure.core/max [x & more] [number? (s/* number?) => number?])

[NOTE]

The arg-preds, if defined, are s/and-wrapped together with the arg-specs when desugared.

The ret-preds are equivalent to (and desugar to) spec's :fn predicates, except that the anonymous function parameter is the ret, and the args are referenced using their symbols. That's because in the gspec syntax spec's :fn is simply considered a 'such that' clause on the ret.

=== Such That

To add an additional condition add | after either the argument specs (just before \=>) or return value spec and supply a lambda that uses the symbol names from the argument list (and % for return value).

[source, clojure]

(>defn f [i] [int? | #(< 0 i 10) => int? | #(pos-int? %)] ...)

=== Nilable

The ? macro can be used as a shorthand for s/nilable:

[source, clojure]

(>fdef clojure.core/empty? [coll] [(? seqable?) => boolean?])

=== Nested Specs

Nested gspecs are defined using the exact same syntax:

[source, clojure]

(>fdef clojure.core/map-indexed ([f] [[nat-int? any? => any?] => fn?]) ([f coll] [[nat-int? any? => any?] (? seqable?) => seq?]))

In the rare cases when a nilable gspec is needed ? is put in a vector rather than a list:

[source, clojure]

(>fdef clojure.core/set-validator! [a f] [atom? [? [any? => any?]] => any?])

TIP: For nested gspecs there's no way to reference the args in the arg-preds or ret-preds by symbol. The recommended approach here is to register the required gspec separately by using >fdef with a keyword. //You can do it with #(\-> % :arg1) in the arg-preds, but that won't work in the ret-preds and it's quite messy anyway. You could theoretically use a nested (s/fspec ...) instead of a gspec, but that gets unwieldy quick.

NOTE: Nested gspecs with one or more any? argspecs desugar to ifn?, so as not to mess up generative testing. This can be overridden by passing a generator – even an empty one, that is simply adding \<- or :gen to the gspec – in which case the gspec will desugar exactly as specified. {zwsp} The assumption here is that any? does not imply that the function can in fact handle any type of argument. {zwsp} You should still write out nested gspecs, even if they are as simple as [any? \=> any?] – this is useful as succinct documentation that this particular function receives exactly one argument.

=== Gspec Advantages

Credit: The above documentation was largely taken from https://github.com/gnl/ghostwheel#the-gspec-syntax[Ghostwheel's documentation].

[#malli-support] == Malli Support

Version 1.2.0 includes full support for Malli. If you use the latter for data validation, you no longer need to maintain a separate spec-based set of schema for Guardrails – it is, after all, the same data you use in your functions!

All you have to do to use it instead of spec is change your require statement. In fact, you can alias BOTH spec-based and malli-based Guardrails in the same namespace – just make sure you use the right kind of schema with the corresponding function!

The special operators \=>, |, and ? can come from either implementation, as they are purely symbolic.

[source, clojure]

(ns foo.bar (:require [clojure.spec.alpha :as s] [com.fulcrologic.guardrails.core :as gr.spec] [com.fulcrologic.guardrails.malli.core :as gr.malli :refer [=> | ?]))

(gr.spec/>defn f [x] [(s/keys :req [:thing/x]) => int?] ...)

(gr.malli/>defn f [x] [[:map :thing/x] => :int] ...)

All configuration options apply to both variants (max checks per second, throw configuration, etc.). Other than the items used within the gspec, they are identical.

=== The Guardrails Malli Registry

Clojure Spec forces you to use a shared global registry, and carefully ensure that your keywords are qualified and do not collide with others.

Malli does have a default registry, but it is not mutable. This lets you to pick registries at will, and allows for more lenient use of "poor" naming because the threat of collision is reduced; however, it makes the writing of function schemas a lot more tedious:

[source, clojure]

(>defn f [x] [[:map {:registry my-reg} ...

Fortunately, Malli supports mutable registries, so we can provide the convenience of a global registry and dramatically reduce the boilerplate.

The mutable Guardrails registry is initiated with the exact content of the default Malli registry, and is held in the com.fulcrologic.guardrails.malli.registry namespace, which also includes functions that you can use to directly add your own schema to it. You'll need to do this for any qualified keywords you want to use in >defns that leverage Malli. For example you can merge in some other schema maps with:

[source, clojure]

(gr.reg/merge-schemas! my-custom-stuff my-other-stuff)

The com.fulcrologic.guardrails.malli.core namespace also has a convenient >def that is like the Clojure Spec def, in that it will register a schema under a qualified keyword for you:

[source, clojure]

(>def :member/name :string)

== Configuration

=== Enabling

Guardrails is disabled by default, emitting exactly what a plain defn would until you explicitly turn it on, which is done via a JVM option. We chose this path because it is highly effective at preventing its accidental enabling in production, which could cause huge performance impacts.

The JVM option -Dguardrails.enabled=true should be used to turn on guardrails. When not defined >defn will emit exactly what defn would.

You may also enable it in cljs in your shadow-cljs config (see Configuration...adding even an empty config map will enable it).

=== The Configuration File

The default config goes in the root of the project as guardrails.edn:

[source, clojure]

{ ; what to emit instead of defn, if you have another defn macro :defn-macro nil

;; Nilable map of Expound configuration options. :expound {:show-valid-values? true :print-specs? true}

;; GLOBALLY enable non-exhaustive checking (this is NOT recommended, you'd ;; usually want to set it in a more granular fashion on the function or ;; namespace level using metadata.) ;; Limits function call checks per second to this maximum number. ;; The intermittent checks are spread out evenly to ensure sufficient coverage, ;; so if MCPS is set to 50 and you have 1000 calls per second, roughly every 20th ;; call will be checked. :guardrails/mcps 100

;; Low-level stack trace output ;; default :full :guardrails/stack-trace :prune ; or :full or :none

;; nREPL hates using stderr, but in other REPLs seeing your problems in red is nice. ;; default false :guardrails/use-stderr? true

;; Optional (default false). Compress the explanation of the problem. :guardrails/compact? true

;; Keep track of the active GR-instrumented calls as a stack, and show that on errors. ;; default false :guardrails/trace? true

;; should a spec failure on args or ret throw an exception? ;; (always logs an informative message) :throw? false

;; should a failure be forwarded through tap> ? :tap>? false}

You can override the config file name using JVM option -Dguardrails.config=filename. In your shadow-cljs config file you can override settings via the [:compiler-options :external-config :guardrails] config path of a build:

[source, clojure]

... :app {:target :browser :dev {:compiler-options {:external-config {:guardrails {:throw? false}} :closure-defines {'goog.DEBUG true}}}} ...

== Performance

Guardrails adds an overhead that is roughly equivalent to the cost of running a Clojure spec or Malli validation. On an Apple M1 Max, the average check on a generic code base (tested against https://github.com/fulcrologic/statecharts[our statecharts library]) takes around 11 microseconds. This actually tested out to roughly the same for Malli AND Spec, though we did find cases where Malli was roughly 2x faster. We did not do further deep analysis.

[source]

nCalls        Max       Mean   MAD      Clock  Total

174,586 14.72ms 11.44μs ±78% 2.00s 91%

(measured using https://github.com/taoensso/tufte[Tufte])

As you can see, if you instrument a lot of your functions, the number of calls can add up quickly (this result was from running 8 tests). So, even though the runtime checks are only taking microseconds, the overall effect can be dramatic.

Here's how fast those tests are when we turn off Guardrails altogether (one call, because we measured the entire test suite runtime instead of the overhead of non-existent runtime checks):

[source]

 nCalls        Max       Mean   MAD      Clock  Total
      1    72.38ms    72.38ms   ±0%    72.38ms   100%

As you can see, the performance can be a significant drag on development, often leading people to strip out their checks, a thing that I've had to do in my own libraries in the past because it hurt downstream users. No more! We now have various ways of improving the situation.

[#check-throttling] === Limiting Max Checks Per Second

In version 1.2.0 and above you can tune Guardrails to limit the number of times a function is checked per second. This can have a huge performance benefit for functions that are called in loops and possibly involve complex and expensive checks.

The limit can be applied globally, to a namespace, or even to a function (recommended), by setting :guardrails/mcps to an integer. Like most other options, you can place it in the global guardrails.edn, the compiler config, the metadata of a namespace, or in the attribute map of a >defn. For example:

[source, clojure]

(>defn f {:guardrails/mcps 100} [x] [int? => int?] ...)

The performance boost from this setting can be dramatic. The Fulcrologic Statecharts library uses Guardrails extensively for internal function checks, and without an MCPS limit some state changes can take human-perceptible amounts of time (like seconds). With it applied the performance impact returns to a virtually unnoticeable level. Measurements on this particular library indicate that each check takes around 20 microseconds, but the overhead of the max-calls-per-second is only 20 or so nanoseconds (on an M1 Max Mac Studio). So, when a calculation ends up causing 100k+ checks (remember there is a check for each arg, and one for the return value) enabling MCPS makes things run literally 1000x faster.

Here's that same set of 8 tests we showed earlier, but with MCPS set to 100 the Guardrails overhead is reduced to only a few milliseconds! In other words, the non-exhaustive checking makes it appear as if guardrails isn't even there. Since it is very common to have just a handful of heavily called functions, dropping each of their check counts to 100 means that you're more likely to only run a few thousand checks in total.

Of course, the downside is that you are no longer getting rigorous data flow checking, but for functions that are called heavily this is an acceptable trade-off, since the probability of detecting some kind of problem can be tuned as you see fit.

The throttling always checks the "leading edge" first; from there it tracks a counter, and uses the high resolution timer to calculate the current number of checks per second that have been done. If that exceeds the set limit, the check is skipped (and the time will change, but not the check count), so after enough time has elapsed, more checks will happen. This has the tendency to "spread out" the checks over time, but of course even high resolution timers are going to give you a lot of jitter at a high call frequency.

[#dynamic-exclusions] === Dynamic Exclusions

Version 1.2.0 also includes the ability to turn checks completely on or off, including at runtime, on a wide or granular level, such as for a namespace or even a function, both in CLJ and CLJS. The functions for controlling this are in com.fulcrologic.guardrails.config:

[#static-exclusions] === Static Exclusions (Special Attention Library Authors)

Most libraries have a main surface API, and then a bunch of internal functions. It is useful to instrument all of these with Guardrails in order to get the benefits of documentation, validation during development, and verification while testing.

Unfortunately, this can have a huge performance impact on downstream consumers of your library that also use Guardrails. It makes sense that a library author should indicate which functions comprise the public API (and should be checked by downstream users), and which are considered more internal and should only be checked when the author is working on the library itself.

Library authors (and application authors as well) can include a file at the top level of their classpath (e.g. src or resources folder, usually) with the special name guardrails-export.edn which contains a config map that can exclude a set of namespaces from ALL checks. To get the checking config at runtime, the >defn functions in those namespaces will only run a simple check on a volatile, so setting these exclusions returns things to pretty much full performance (~= no Guardrails used at all).

For example, src/main/guardrails-export.edn in the https://github.com/fulcrologic/statecharts[Fulcrologic statecharts library] looks something like this:

[source, clojure]

{:exclude #{com.fulcrologic.statecharts.algorithms.v20150901-impl}}

Remember, this goes in a specially-named file, not in the primary guardrails configuration file, since these are meant to be seen by downstream consumers (like data_readers.clj).

A quick implementation note: In order to make this work in Clojurescript a macro must run that can read the JVM classpath, and compile all the exclusions found in on-disk (and in JAR files) into a runtime set. The same happens in Clojure (though in CLJ you can read the fs again at any time).

The set of exclusions found in export files at load time is what reset-exclusions! will restore if you have dynamically changed the exclusions at runtime. Basically this load-time set is kept in a var for exactly this reason since CLJS cannot re-trigger a classpath scan.

NOTE: As a library author these exclusions will end up applying to your code as well, since it is difficult to tell which export file belongs to which project on the classpath. Thus the beginning of your test namespaces (and possibly your non-published user ns) should all start with a call to config/clear-exclusions! if you want to include your implementation checks while running your tests and working on your library code.

== Why?

Clojure spec's instrument (and Orchestra's outstrument) have a number of disadvantages when trying to use them for this purpose. Specifically, they are side-effecting after-calls that do not play particularly well with hot code reload, and always throw when there is a failed spec. Furthermore, management of the accidental inclusion of specs in your cljs builds (which increase build size) is a constant pain when writing separate specs for functions (the specs end up in a whole other file, inclusion needs to be via a development ns, and things easily get out of date).

This library is a middle ground between the features of raw Clojure spec and George Lipov's Ghostwheel. Much of the source code in this library is directly from https://github.com/gnl/ghostwheel[Ghostwheel].

This library's goals are:

without the extra overhead of Ghostwheel's support for:

== Copyright and License

The code and documentation taken from Ghostwheel is by George Lipov and follows the ownership/copyright of that library. The modifications in this library are copyrighted by Fulcrologic, LLC.

This library follows Ghostwheel's original license: Eclipse public license version 2.0.