Mushin is a Japanese term used in martial arts that refers to the state of mind obtained by practice. At this point, a person relies not on what they think should be the next move, but what is their trained natural reaction (or instinct).
Mushin is a pure Rust
, no-unsafe library for computing gradients on dynamic computational graphs using reverse automatic differentiation. In other words, what PyTorch
is to Python
is what Mushin
is to Rust
.
Internally it uses the arrayfire crate to provide parallel computations on specialized hardware, such as Nvidia CUDA GPUs, Intel MKL CPUs... For details on what devices are available and installation instructions for your OS, please take a look at the arrayfire
crate documentation. The installation of the arrayfire
binaries is required for Mushin
to work.
One clear benefit of this crate versus PyTorch
is Rust
's strong type system. All operations performed on tensors during the graph build are checked at compile time for mathematical soundness, which means no runtime error after an hour of model training. If it compiles, it works. If at some point you make a mistake while building your made in hell nested computational graph, like for example on the shape of a tensor, you'll be stopped even before you can start feeling stupid.
Moreover, because constant and variable tensors are actually different types, the developer continuously has an overview of which resulting tensors contribute to the gradients and which not. On top of that, the compiler will stop you from trying to compute the gradient of or with respect to a constant!
Another benefit when compared to other similar libraries is that the computation graph is eagerly evaluated, which means that the graph is trully dynamic. In other words, your next operations can be conditioned to the results of previous ones, and so you can have conditional branching while building your graph.
First, install the arrayfire binaries as indicated by the arrayfire crate.
Then, add mushin
as one of your dependencies:
[dependencies]
mushin = "0.5"
The following is quite a self-explanatory example of the basic usage of Mushin to build computation graphs and get the derivatives back:
use mushin as mu;
fn main() {
let x = mu::eye::<1, 1, 2, 3>(3.0).freeze();
let w = mu::randn::<1, 1, 3, 2>();
let b = mu::fill::<1, 1, 3, 3>(0.0);
let z = mu::add(&mu::mm(&w, &x), &b);
z.backward();
let dz_dw = w.grad()
let dz_db = b.grad()
}
By default, this library enables the nn
feature that gives access to the nn
module, which builds upon the auto-grad foundation of Mushin
to deliver a set of Deep Learning utilities, such as activation functions, layers, losses and optimizers. If you don't really need that part and you are only insterested in the pure auto-grad functionality of this library, the nn
module can be disabled with default-features = false
. Here follows a brief example on how it works:
use mushin as mu;
use mu::nn::{layers::Linear, activations::relu, losses::mse, optimizers::SGD};
let x = mu::eye::<16, 1, 1, 3>(1.0).freeze();
let y = mu::eye::<16, 1, 1, 5>(3.0).freeze();
let linear = Linear::<16, 3, 5>::new();
let optim = SGD::new(&[linear.parameters()], 0.01);
for _ in 0..5 {
let z = relu(&linear.forward(&x));
let loss = mse(&z, &y);
loss.backward();
optim.step();
loss.reset();
}
Many thanks!
Mushin is distributed under the terms of both the MIT license and the Apache License (Version 2.0).
See LICENSE-APACHE and LICENSE-MIT, and COPYRIGHT for details.