amakelov / mandala

A simple & elegant experiment tracking framework that integrates persistence logic & best practices directly into Python
Apache License 2.0
506 stars 15 forks source link
data-science experiment-tracking incremental-computation machine-learning

logo
Install | Open In Colab | Tutorials | Docs | Blogs | FAQs

Automatically save, query & version Python computations

mandala eliminates the effort and code overhead of ML experiment tracking (and beyond) with two generic tools:

  1. The @op decorator:
    • captures inputs, outputs and code (+dependencies) of Python function calls
    • automatically reuses past results & never computes the same call twice
    • designed to be composed into end-to-end persisted programs, enabling efficient iterative development in plain-Python, without thinking about the storage backend.
  1. The ComputationFrame data structure:
    • automatically organizes executions of imperative code into a high-level computation graph of variables and operations. Detects patterns like feedback loops, branching/merging and aggregation/indexing
    • queries relationships between variables by extracting a dataframe where columns are variables and operations in the graph, and each row contains values/calls of a (possibly partial) execution of the graph
    • automates exploration and high-level operations over heterogeneous "webs" of @op calls
Description

Video demo

A quick demo of running computations in mandala and simultaneously updating a view of the corresponding ComputationFrame and the dataframe extracted from it (code can be found here):

https://github.com/amakelov/mandala/assets/1467702/85185599-10fb-479e-bf02-442873732906

Install

pip install git+https://github.com/amakelov/mandala

Tutorials

Blogs & papers

FAQs

How is this different from other experiment tracking frameworks?

Compared to popular tools like W&B, MLFlow or Comet, mandala:

How is the @op cache invalidated?

Can I change the code of @ops, and what happens if I do?

Is it production-ready?

How self-contained is it?

Limitations

Roadmap for future features

Overall

Memoization

Computation frames

Versioning

Performance

Galaxybrained vision

Aspirationally, mandala is about much more than ML experiment tracking. The main goal is to make persistence logic & best practices a natural extension of Python. Once this is achieved, the purely "computational" code you must write anyway doubles as a storage interface. It's hard to think of a simpler and more reliable way to manage computational artifacts.

A first-principles approach to managing computational artifacts

What we want from our storage are ways to

The key observation is that execution traces can already answer ~all of these questions.

Related work

mandala combines ideas from, and shares similarities with, many technologies. Here are some useful points of comparison: