codex-semantics-library / patricia-tree

Patricia Tree data structure in OCaml for maps and sets, supports generic (GADT) keys
https://codex.top/api/patricia-tree/
GNU Lesser General Public License v2.1
10 stars 0 forks source link
ocaml ocaml-library patricia-tree

Patricia Tree

Latest version OCaml Version GitHub License GitHub Actions Workflow Status Documentation

This is an OCaml library that implements sets and maps as Patricia Trees, as described in Okasaki and Gill's 1998 paper Fast mergeable integer maps. It is a space-efficient prefix trie over the big-endian representation of the key's integer identifier.

The documentation for this library can be found online at codex.top/patricia-tree/.

This library was written by Matthieu Lemerre then further improved by Dorian Lesbre, as part of the Codex semantics library, developed at CEA List.

Table of Contents:

Installation

This library can be installed with opam:

opam install patricia-tree

Alternatively, you can clone the source repository and install with dune:

git clone git@github.com:codex-semantics-library/patricia-tree.git
cd patricia-tree
opan install . --deps-only
dune build -p patricia-tree
dune install
# To build documentation
opam install . --deps-only --with-doc
dune build @doc

See the examples to jump right into using this library.

Features

Quick overview

Functors

This library contains a single module, PatriciaTree. The main functors used to build our maps and sets are the following:

(** {2 Homogeneous maps and sets} *)

module MakeMap(Key: KEY) : MAP with type key = Key.t
module MakeSet(Key: KEY) : SET with type elt = Key.t

(** {2 Heterogeneous maps and sets} *)

module MakeHeterogeneousSet(Key: HETEROGENEOUS_KEY) : HETEROGENEOUS_SET
  with type 'a elt = 'a Key.t
module MakeHeterogeneousMap(Key: HETEROGENEOUS_KEY)(Value: HETEROGENEOUS_VALUE) :
  HETEROGENEOUS_MAP
  with type 'a key = 'a Key.t
   and type ('k,'m) value = ('k,'m) Value.t

There are also hash-consed versions of these four functors: MakeHashconsedMap, MakeHashconsedSet, MakeHashconsedHeterogeneousMap and MakeHashconsedHeterogeneousSet. These uniquely number their nodes, which means:

Interfaces

Here is a brief overview of the various module types of our library:

Examples

To use this library, install it and add the following to your dune files:

(executable ; or library
  ...
  (libraries patricia-tree ...)
)

Homogeneous map

Here is a small example of a non-generic map:

(** Create a key struct *)
module Int (*: PatriciaTree.KEY*) = struct
  type t = int
  let to_int x = x
end

(** Call the map and/or set functors *)
module IMap = PatriciaTree.MakeMap(Int)
module ISet = PatriciaTree.MakeSet(Int)

(** Use all the usual map operations *)
let map =
  IMap.empty |>
  IMap.add 1 "hello" |>
  IMap.add 2 "world" |>
  IMap.add 3 "how do you do?"
  (* Also has an [of_list] and [of_seq] operation for initialization *)

let _ = IMap.find 1 map (* "hello" *)
let _ = IMap.cardinal map (* 3 *)

(** The strength of Patricia Tree is the speedup of operation on multiple maps
    with common subtrees. *)
let map2 =
  IMap.idempotent_inter_filter (fun _key _l _r -> None)
  (IMap.add 4 "something" map) (IMap.add 5 "something else" map)
let _ = map == map2 (* true *)
(* physical equality is preserved as much as possible, although some intersections
   may need to build new nodes and won't be fully physically equal, they will
   still share subtrees if possible. *)

(** Many operations preserve physical equality whenever possible *)
let _ = (IMap.add 1 "hello" map) == map (* true: already present *)

(** Example of cross map/set operation: only keep the bindings of [map]
    whose keys are in a given set *)
let set = ISet.of_list [1; 3]
module CrossOperations = IMap.WithForeign(ISet.BaseMap)
let restricted_map = CrossOperations.nonidempotent_inter
  { f = fun _key value () -> value } map set

Heterogeneous map

(** Very small typed expression language *)
type 'a expr =
  | G_Const_Int : int -> int expr
  | G_Const_Bool : bool -> bool expr
  | G_Addition : int expr * int expr -> int expr
  | G_Equal : 'a expr * 'a expr -> bool expr

module Expr : PatriciaTree.HETEROGENEOUS_KEY with type 'a t = 'a expr = struct
  type 'a t = 'a expr

  (** Injective, so long as expression are small enough
      (encodes the constructor discriminant in two lowest bits).
      Ideally, use a hash-consed type, to_int needs to be fast *)
  let rec to_int : type a. a expr -> int = function
    | G_Const_Int i ->   0 + 4*i
    | G_Const_Bool b ->  1 + 4*(if b then 1 else 0)
    | G_Addition(l,r) -> 2 + 4*(to_int l mod 10000 + 10000*(to_int r))
    | G_Equal(l,r) ->    3 + 4*(to_int l mod 10000 + 10000*(to_int r))

  (** Full polymorphic equality *)
  let rec polyeq : type a b. a expr -> b expr -> (a, b) PatriciaTree.cmp =
    fun l r -> match l, r with
    | G_Const_Int l, G_Const_Int r -> if l = r then Eq else Diff
    | G_Const_Bool l, G_Const_Bool r -> if l = r then Eq else Diff
    | G_Addition(ll, lr), G_Addition(rl, rr) -> (
        match polyeq ll rl with
        | Eq -> polyeq lr rr
        | Diff -> Diff)
    | G_Equal(ll, lr), G_Equal(rl, rr) ->    (
        match polyeq ll rl with
        | Eq -> (match polyeq lr rr with Eq -> Eq | Diff -> Diff) (* Match required by typechecker *)
        | Diff -> Diff)
    | _ -> Diff
end

(** Map from expression to their values: here the value only depends on the type
    of the key, not that of the map *)
module EMap = PatriciaTree.MakeHeterogeneousMap(Expr)(struct type ('a, _) t = 'a end)

(** You can use all the usual map operations *)
let map : unit EMap.t =
  EMap.empty |>
  EMap.add (G_Const_Bool false) false |>
  EMap.add (G_Const_Int 5) 5 |>
  EMap.add (G_Addition (G_Const_Int 3, G_Const_Int 6)) 9 |>
  EMap.add (G_Equal (G_Const_Bool false, G_Equal (G_Const_Int 5, G_Const_Int 7))) true

let _ = EMap.find (G_Const_Bool false) map (* false *)
let _ = EMap.cardinal map (* 4 *)

(** Fast operations on multiple maps with common subtrees. *)
let map2 =
  EMap.idempotent_inter_filter
    { f = fun _key _l _r -> None } (* polymorphic 1rst order functions are wrapped in records *)
    (EMap.add (G_Const_Int 0) 8 map)
    (EMap.add (G_Const_Int 0) 9 map)

Release status

This should be close to a stable release. It is already being used as part of a larger project successfully, and this usage as helped us mature the interface. As is, we believe the project is usable, and we don't anticipate any major change before 1.0.0. We didn't commit to a stable release straight away as we would like a bit more time using this library before doing so.

Known issues

There is a bug in the OCaml typechecker which prevents us from directly defining non-generic maps as instances of generic maps. To avoid this, non-generic maps use a separate value type (instead of just using 'b)

type (_, 'b) snd = Snd of 'b [@@unboxed]

It should not incur any extra performance cost as it is unboxed, but can appear when manipulating non-generic maps.

For more details about this issue, see the OCaml discourse discussion.

Comparison to other OCaml libraries

ptmap and ptset

There are other implementations of Patricia Tree in OCaml, namely ptmap and ptset, both by J.C. Filliatre. These are smaller and closer to OCaml's built-in Map and Set, however:

dmap

Additionally, there is a dependent map library: dmap, which gave us the idea of making our PatriciaTree dependent. It allows creating type safe dependent maps similar to our heterogeneous maps. However, its maps aren't Patricia trees. They are binary trees build using a (polymorphic) comparison function, similarly to the maps of the standard library.

Another difference is that the type of values in the map is independent of the type of the keys, allowing keys to be associated with different values in different maps. i.e. we map 'a key to any ('a, 'b) value type, whereas dmap only maps 'a key to 'a or 'a value.

dmap also works with OCaml >= 4.12, whereas we require OCaml >= 4.14.

Contributions and bug reports

Any contributions are welcome!

You can report any bug, issues, or desired features using the Github issue tracker. Please include OCaml, dune, and library version information in you bug reports.

If you want to contribute code, feel free to fork the repository on Github and open a pull request. By doing so you agree to release your code under this project's license (LGPL-2.1).

There is no imposed coding style for this repository, here are just a few guidelines and conventions: