Closed Blimpyway closed 1 week ago
I found when this happens - because the # of epochs was smaller than best_tmssize (=500) and lower than the 500 value in `, tm = combine(tms[1:500], 2, x_test, y_test, batch=true)`
btw, shouldn't there be best_tms_size instead of a literal 500 ?
So it assumes # of epochs should be high. I cannot afford that amount of time to see results. it took 5 minutes for first 6 epochs, with reduced clauses.
A piece of biased advice, for a first demo I recommend a piece of code able to finish in ~2 minutes instead of hours on ordinary hardware, because it is more inviting for would-be followers to keep ... following.
Expedience often trumps top accuracy.
Otherwise what you do in this repository seems to have great potential
So it assumes # of epochs should be high. I cannot afford that amount of time to see results. it took 5 minutes for first 6 epochs, with reduced clauses. A piece of biased advice, for a first demo I recommend a piece of code able to finish in ~2 minutes instead of hours on ordinary hardware, because it is more inviting for would-be followers to keep ... following.
Thank you for the feedback.
As far as I know, Tsetlin.jl
is one of the fastest Tsetlin Machine implementations available.
You can try the following hyperparameters to achieve satisfactory MNIST
results with a small number of clauses:
const EPOCHS = 300
const CLAUSES = 128
const T = 8
const R = 0.89
const L = 16
P.S.: Also, you need to know that each successive epoch runs faster than the previous one.
P.P.S: You can also reduce booleanization to 2 bits instead of 3 to increase training speed:
# 2-bits Booleanization
x_train = [TMInput(vec([
[x > 0 ? true : false for x in i];
[x > 0.5 ? true : false for x in i];
])) for i in x_train]
x_test = [TMInput(vec([
[x > 0 ? true : false for x in i];
[x > 0.5 ? true : false for x in i];
])) for i in x_test]
Using just one bit for booleanization is not a good idea.
@Blimpyway
How many predictions did you obtain on mnist_benchmark_inference.jl
using your CPU?
Run julia --project=. -O3 -t 32,1 mnist_benchmark_inference.jl
where 32
is the number of your logical CPU cores.
I've posted it on reddit, I have a 4 core cpu. The benchmark is fine, what I found slow was the default 1000 epochs (?) and 2048 clauses in mnist_simple.jl - that training would take hours.
I don't deny it is fast among tsetlin implementations, for a casual curious (as I was) a shorter train which achieves a significant result (97-98%) in minutes would be more relevant/appealing. Maybe two demos "a really simple first try" one and a "overnight great expectations".
Another issue I noticed the optimised network is made using x_test and y_test which casts doubt on whether final best performing combination is a test data-aware cerry picked combination of best networks, which is a no-no for paper reviewers. Not touching x_test until the final benchmark (base all optimizations on x_train alone) does not significantly change end result.
Yes, the combinatorial merge example is overfitted on test data and is not suitable for paper reviews. However, this approach is still useful for k-fold cross-validation or augmented validation datasets.
Regarding dataset - I'm really happy with a single threshold - e.g. 0.33 MNIST is really on/off turning a couple bits on or off doesn't really change anything, and expanding the training set from 60k to 120k or 180k points might get picky reviewers suspicions of data augmenting (which it isn't). so I'm happy with the results (98.49% in under 8 minutes, 4 threads) with the following parameters:
x_train = [TMInput(vec([ [x > 0.33 ? true : false for x in i]; ])) for i in x_train] x_test = [TMInput(vec([ [x > 0.33 ? true : false for x in i]; ])) for i in x_test]
const EPOCHS = 20 const CLAUSES = 512 const T = 32 const R = 0.94 const L = 16
JFYI: For the MNIST
512
clauses optimal parameters are:
const CLAUSES = 512
const T = 16
const R = 0.92
const L = 12
Thanks. Regarding the parameters you recommended - as a quick demo ending in a couple minutes it works great with 30 epochs instead of 300 - 97.65% on a tiny 20k param model, before finishing a cofee - very happy with that!
const EPOCHS = 30
const CLAUSES = 128
const T = 8
const R = 0.89
const L = 16
Since I'm a Julia (and Tsetlin) illiterate to really understand the code, I hope you don't mind a few following questions here.
One is about the README stating the L (number of literals) is different from vanilla Tsetlin.
Assuming a MNIST binarized digit input (x) within 784 bits space, what mapping is assumed between a clause with only e.g. 16 literals and the 784 input bits?
Does the "canonical" tsetlin assigns 784 literals per clause?
Thanks .
Thank you, it was a typo in the README.
The hyperparameter L
limits the number of included literals in a clause.
The main idea is described in this paper: https://arxiv.org/pdf/2301.08190.pdf with additional performance improvements from my side.
You can add the verbose=2
parameter to the train!()
function to get statistics on included literals during training:
train!(tm, x_train, y_train, x_test, y_test, EPOCHS, best_tms_size=best_tms_size, shuffle=true, batch=true, verbose=2)
Don't forget to do a git pull
before using this functionality.
Hi
Training runs fine, the /tmp/tms.tm is saved but it fails at the following step, compiling the model:
_, tm = combine(tms[1:500], 2, x_test, y_test, batch=true)
My julia version is 1.10.2 (2024-03-01) OS isLinux 6.5.0-26-generic #26~22.04.1-Ubuntu
Below is the full output.
Thanks.