BooBSD / Tsetlin.jl

The Tsetlin Machine library with zero external dependencies performs quite well.
MIT License
14 stars 0 forks source link
julia-language machine-learning multithreading reinforcement-learning tsetlin-machine

Tsetlin Machine

“Speed is the most important feature.”

Fred Wilson

The Tsetlin Machine library with zero external dependencies performs quite well. Over 50 million MNIST predictions per second is achieved on a desktop CPU.

Tsetlin Machine benchmark

Key features

Introduction

Here is a quick "Hello, World!" example of a typical use case.

Importing the necessary functions and MNIST dataset:

using MLDatasets: MNIST
using .Tsetlin: TMInput, TMClassifier, train!, predict, accuracy, save, load, unzip

x_train, y_train = unzip([MNIST(:train)...])
x_test, y_test = unzip([MNIST(:test)...])

Booleanizing input data:

x_train = [TMInput(vec([
    [x > 0 ? true : false for x in i];
    [x > 0.5 ? true : false for x in i];
])) for i in x_train]
x_test = [TMInput(vec([
    [x > 0 ? true : false for x in i];
    [x > 0.5 ? true : false for x in i];
])) for i in x_test]

There are some different hyperparameters compared to the Vanilla Tsetlin Machine. The hyperparameter R is a float in the range of 0.0 to 1.0. To get the actual R from the Vanilla S parameter, use the following formula: R = S / (S + 1). The hyperparameter L limits the number of included literals in a clause. best_tms_size is the number of the best TM models collected during the training process. After training, you can save this ensemble of models to your drive or increase accuracy by using Binomial Combinatorial Merge with the combine() function.

const EPOCHS = 1000
const CLAUSES = 2048
const T = 32
const R = 0.94
const L = 12
const best_tms_size = 500

Training the Tsetlin Machine over 1000 epochs and saving the best TM model to disk:

tm = TMClassifier(CLAUSES, T, R, L=L, states_num=256, include_limit=128)
tm_best, tms = train!(tm, x_train, y_train, x_test, y_test, EPOCHS, best_tms_size=best_tms_size, best_tms_compile=true, shuffle=true, batch=true)
save(tm_best, "/tmp/tm_best.tm")

Load the best Tsetlin Machine model and calculate the actual test accuracy:

tm = load("/tmp/tm_best.tm")
println(accuracy(predict(tm, x_test), y_test))

How to run examples

  1. Make sure that you have installed the latest version of the Julia language.
  2. Go to the examples directory: cd ./examples
  3. Run julia --project=. -O3 -t 32 --gcthreads=32,1 mnist_simple.jl where 32 is the number of your logical CPU cores.

Benchmark

The maximum MNIST inference speed achieved is 52 million predictions per second in batch mode on a desktop CPU Ryzen 7950X3D, utilizing 32 threads.

Trained and optimized models can be found in ./examples/models/.

How to run MNIST inference benchmark:

  1. Please close all other programs such as web browsers, antivirus software, torrent clients, music players, etc.
  2. Go to the examples directory: cd ./examples
  3. Run julia --project=. -O3 -t 32 mnist_benchmark_inference.jl where 32 is the number of your logical CPU cores.

Build Status