BooBSD / Tsetlin.jl

The Tsetlin Machine library with zero external dependencies performs quite well.
MIT License
13 stars 0 forks source link

Error running mnist_simple.jl #2

Closed Blimpyway closed 1 week ago

Blimpyway commented 1 week ago

Hi

Training runs fine, the /tmp/tms.tm is saved but it fails at the following step, compiling the model: _, tm = combine(tms[1:500], 2, x_test, y_test, batch=true) My julia version is 1.10.2 (2024-03-01) OS is Linux 6.5.0-26-generic #26~22.04.1-Ubuntu

Below is the full output.

Running in 4 threads. Accuracy over 1 epochs (Clauses: 2048, T: 32, R: 0.94, L: 12, states_num: 256, include_limit: 128):

1 Accuracy: 96.01% Best: 96.01% Training: 124.452s Testing: 0.247s

Done. 1 epochs (Clauses: 2048, T: 32, R: 0.94, L: 12, states_num: 256, include_limit: 128). Time elapsed: 00:02:05. Best accuracy was: 96.01%.

Saving model to /tmp/tms.tm... Done.

Loading model from /tmp/tms.tm... Done.

ERROR: LoadError: BoundsError: attempt to access 1-element Vector{Tuple{Float64, Main.Tsetlin.AbstractTMClassifier}} at index [1:500] Stacktrace: [1] throw_boundserror(A::Vector{Tuple{Float64, Main.Tsetlin.AbstractTMClassifier}}, I::Tuple{UnitRange{Int64}}) @ Base ./abstractarray.jl:737 [2] checkbounds @ ./abstractarray.jl:702 [inlined] [3] getindex(A::Vector{Tuple{Float64, Main.Tsetlin.AbstractTMClassifier}}, I::UnitRange{Int64}) @ Base ./array.jl:973 [4] top-level scope @ ~/tmp/Tsetlin.jl/examples/mnist_simple.jl:46 in expression starting at /home/cezar/tmp/Tsetlin.jl/examples/mnist_simple.jl:46

Thanks.

Blimpyway commented 1 week ago

I found when this happens - because the # of epochs was smaller than best_tmssize (=500) and lower than the 500 value in `, tm = combine(tms[1:500], 2, x_test, y_test, batch=true)`

btw, shouldn't there be best_tms_size instead of a literal 500 ?

So it assumes # of epochs should be high. I cannot afford that amount of time to see results. it took 5 minutes for first 6 epochs, with reduced clauses.

A piece of biased advice, for a first demo I recommend a piece of code able to finish in ~2 minutes instead of hours on ordinary hardware, because it is more inviting for would-be followers to keep ... following.

Expedience often trumps top accuracy.

Blimpyway commented 1 week ago

Otherwise what you do in this repository seems to have great potential

BooBSD commented 1 week ago

So it assumes # of epochs should be high. I cannot afford that amount of time to see results. it took 5 minutes for first 6 epochs, with reduced clauses. A piece of biased advice, for a first demo I recommend a piece of code able to finish in ~2 minutes instead of hours on ordinary hardware, because it is more inviting for would-be followers to keep ... following.

Thank you for the feedback.

As far as I know, Tsetlin.jl is one of the fastest Tsetlin Machine implementations available. You can try the following hyperparameters to achieve satisfactory MNIST results with a small number of clauses:

const EPOCHS = 300
const CLAUSES = 128
const T = 8
const R = 0.89
const L = 16

P.S.: Also, you need to know that each successive epoch runs faster than the previous one.

P.P.S: You can also reduce booleanization to 2 bits instead of 3 to increase training speed:

# 2-bits Booleanization
x_train = [TMInput(vec([
    [x > 0 ? true : false for x in i];
    [x > 0.5 ? true : false for x in i];
])) for i in x_train]
x_test = [TMInput(vec([
    [x > 0 ? true : false for x in i];
    [x > 0.5 ? true : false for x in i];
])) for i in x_test]

Using just one bit for booleanization is not a good idea.

BooBSD commented 1 week ago

@Blimpyway

How many predictions did you obtain on mnist_benchmark_inference.jl using your CPU?

Run julia --project=. -O3 -t 32,1 mnist_benchmark_inference.jl where 32 is the number of your logical CPU cores.

Blimpyway commented 1 week ago

I've posted it on reddit, I have a 4 core cpu. The benchmark is fine, what I found slow was the default 1000 epochs (?) and 2048 clauses in mnist_simple.jl - that training would take hours.

I don't deny it is fast among tsetlin implementations, for a casual curious (as I was) a shorter train which achieves a significant result (97-98%) in minutes would be more relevant/appealing. Maybe two demos "a really simple first try" one and a "overnight great expectations".

Another issue I noticed the optimised network is made using x_test and y_test which casts doubt on whether final best performing combination is a test data-aware cerry picked combination of best networks, which is a no-no for paper reviewers. Not touching x_test until the final benchmark (base all optimizations on x_train alone) does not significantly change end result.

BooBSD commented 1 week ago

Yes, the combinatorial merge example is overfitted on test data and is not suitable for paper reviews. However, this approach is still useful for k-fold cross-validation or augmented validation datasets.

Blimpyway commented 1 week ago

Regarding dataset - I'm really happy with a single threshold - e.g. 0.33 MNIST is really on/off turning a couple bits on or off doesn't really change anything, and expanding the training set from 60k to 120k or 180k points might get picky reviewers suspicions of data augmenting (which it isn't). so I'm happy with the results (98.49% in under 8 minutes, 4 threads) with the following parameters:

x_train = [TMInput(vec([ [x > 0.33 ? true : false for x in i]; ])) for i in x_train] x_test = [TMInput(vec([ [x > 0.33 ? true : false for x in i]; ])) for i in x_test]

const EPOCHS = 20 const CLAUSES = 512 const T = 32 const R = 0.94 const L = 16

BooBSD commented 1 week ago

JFYI: For the MNIST 512 clauses optimal parameters are:

const CLAUSES = 512
const T = 16
const R = 0.92
const L = 12
Blimpyway commented 1 week ago

Thanks. Regarding the parameters you recommended - as a quick demo ending in a couple minutes it works great with 30 epochs instead of 300 - 97.65% on a tiny 20k param model, before finishing a cofee - very happy with that!

    const EPOCHS = 30
    const CLAUSES = 128
    const T = 8
    const R = 0.89
    const L = 16
Blimpyway commented 1 week ago

Since I'm a Julia (and Tsetlin) illiterate to really understand the code, I hope you don't mind a few following questions here.

One is about the README stating the L (number of literals) is different from vanilla Tsetlin.

Assuming a MNIST binarized digit input (x) within 784 bits space, what mapping is assumed between a clause with only e.g. 16 literals and the 784 input bits?

Does the "canonical" tsetlin assigns 784 literals per clause?

Thanks .

BooBSD commented 1 week ago

Thank you, it was a typo in the README. The hyperparameter L limits the number of included literals in a clause. The main idea is described in this paper: https://arxiv.org/pdf/2301.08190.pdf with additional performance improvements from my side.

You can add the verbose=2 parameter to the train!() function to get statistics on included literals during training:

train!(tm, x_train, y_train, x_test, y_test, EPOCHS, best_tms_size=best_tms_size, shuffle=true, batch=true, verbose=2)

Don't forget to do a git pull before using this functionality.