cair / pyTsetlinMachine

Implements the Tsetlin Machine, Convolutional Tsetlin Machine, Regression Tsetlin Machine, Weighted Tsetlin Machine, and Embedding Tsetlin Machine, with support for continuous features, multigranularity, clause indexing, and literal budget
https://pypi.org/project/pyTsetlinMachine/
MIT License
129 stars 26 forks source link

How to do interactive/incremental regression fitting? #8

Closed DestyNova closed 3 years ago

DestyNova commented 3 years ago

Hello, I was playing around trying to use pyTsetlinMachine's RegressionTsetlinMachine as a drop-in replacement for a neural network in a naive Q-learning agent (I'm not sure if it's an appropriate tool for that, but that's why I'm trying it out). The way I tried to implement it, the predict and fit functions are called with one row of data at a time, rather than splitting a static dataset into training and test samples. However, I ran into a couple of problems. The first was:

    self.encoded_X = np.ascontiguousarray(np.empty(int(number_of_examples * self.number_of_patches * self.number_of_ta_chunks), dtype=np.uint32))
AttributeError: 'RegressionTsetlinMachine' object has no attribute 'number_of_patches'

This seems to be because I called predict before fit, which happens to initialise some important fields like number_of_patches. I'm not sure what the best solution there is, other than calling fit once after initialising the TM, even though that would teach it garbage -- maybe not so bad with epochs=1.

Then I tried the simplest possible way to learn the function f(a,b) = a+b, with just one example input:

from pyTsetlinMachine.tm import RegressionTsetlinMachine
import numpy as np

tm = RegressionTsetlinMachine(1000, 500*10, 2.75, weighted_clauses=True)

feats = np.array([[1,2]])
print(f'feats.shape: {feats.shape}')

target = np.array([[3]])
tm.fit(feats, target, incremental=True)

This crashes with a different error:

feats.shape: (1, 2)
/home/omacfhearai/.local/lib/python3.8/site-packages/pyTsetlinMachine/tm.py:743: RuntimeWarning: invalid value encountered in true_divide
  Ym = np.ascontiguousarray((Y - self.min_y)/(self.max_y - self.min_y)*self.T).astype(np.int32)
Traceback (most recent call last):
  File "tsetlin_test.py", line 10, in <module>
    tm.fit(feats, target, incremental=True)
  File "/home/omacfhearai/.local/lib/python3.8/site-packages/pyTsetlinMachine/tm.py", line 747, in fit
    _lib.tm_fit_regression(self.rtm, self.encoded_X, Ym, number_of_examples, epochs)
ctypes.ArgumentError: argument 3: <class 'TypeError'>: array must have 1 dimension(s)

If I understand correctly, this is due to selecting the minimum and maximum values from the target array in the fit function, which for a single example means min_y == max_y and we end up trying to evaluate 0/0 when calculating Ym.

My understanding might be wrong, since I don't even understand what the parameters T and s refer to when constructing the TM. Is there a proper way to do this?

Blimpyway commented 3 years ago

I think the second error is due to the fact the target = np.array([[3]]) is two dimensional. Try target = np.array([3]) Also in order to learn anything it makes no sense to have a single data point, maybe programmers assumed at least two, and made no code stability testing on a nonsensical case from machine learning perspective

Blimpyway commented 3 years ago

So this worked fine: `from pyTsetlinMachine.tm import RegressionTsetlinMachine import numpy as np

tm = RegressionTsetlinMachine(1000, 500*10, 2.75, weighted_clauses=True)

feats = np.array([[1,2],[3,4]]) # !!!! two data points print(f'feats.shape: {feats.shape}')

target = np.array([3,7]) # !!! notice shape (2,) is one dimensional tm.fit(feats, target, incremental=True) `

DestyNova commented 3 years ago

I think the second error is due to the fact the target = np.array([[3]]) is two dimensional. Try target = np.array([3])

Thanks, this helped, although I don't quite understand why the target should be one-dimensional when the input is multi-dimensional.

Also in order to learn anything it makes no sense to have a single data point, maybe programmers assumed at least two, and made no code stability testing on a nonsensical case from machine learning perspective

How is it a nonsensical case from a machine learning perspective, if you're interacting in an online environment? For example, if you're controlling a robot, you take one step at a time and update your model with a reward value after each step. This is the basic form of Q-learning, which is one of the most fundamental ideas in reinforcement learning.

I guess you're thinking of machine learning as having a fixed batch of training and test data, but reinforcement learning usually works on single observations.

Blimpyway commented 3 years ago

Well, probably they didn't had in mind online learning, but batch learning with several examples at a time. Just for the context, I tried to use tsetlin machine classifier for cartpole balancing, and I noticed that calling predict() ten thousand one row at a time is orders of magnitude slower than predicting 10000 rows at once. I know it is not nice even for online predicting.

So that's my assumption, they didn't thought much about robots or interactive agents on these initial implementation.

Blimpyway commented 3 years ago

Probably the regressor is made to predict only one output value for each row of input data. Not all are implemented to predict multiple columns

Blimpyway commented 3 years ago

Also keep in mind the core Tsetlin automaton or (whatever it is called) makes a 1 bit prediction. Is a binary classifier. They can only tell Hotdog from Not hotdog. And upper refinements (classifier, regressor) need gang up multiple "core engines" to pop out more nuanced predictions.

But, since the primary engine is by design a binary classifier it makes no sense to "train" it with one single point of data.

DestyNova commented 3 years ago

Probably the regressor is made to predict only one output value for each row of input data.

It seems so -- the docs just say X and Y with no indication of dimensionality or type, so that'd be a good improvement.

I managed to get an extremely simplistic online vs batch regression of f(a,b) = a+b running, but as expected the online version produces bad results:

from pyTsetlinMachine.tm import RegressionTsetlinMachine
import numpy as np

tm = RegressionTsetlinMachine(2000, 5000, 4.75, weighted_clauses=True)

all_feats = []
all_targets = []
test_feats = [(1,1), (1,2), (3,4), (2,2)]

# online training
for i in range(5):
    for j in range(5):
        feats = np.array([[i,j]])
        target = np.array([i+j])
        tm.fit(feats, target, incremental=True, epochs=1)
        # save for batch training later
        all_feats += [[i,j]]
        all_targets += [i+j]

print(f"Training data: {list(zip(all_feats, all_targets))}\n")
print("* Predictions after online training")
[print(f"{i}+{j} = {tm.predict(np.array([[i,j]]))}") for (i,j) in test_feats]

# batch training with a fresh TM
tm = RegressionTsetlinMachine(2000, 5000, 4.75, weighted_clauses=True)
tm.fit(np.array(all_feats), np.array(all_targets), incremental=True, epochs=100)

print("* Predictions after batch training")
[print(f"{i}+{j} = {tm.predict(np.array([[i,j]]))}") for (i,j) in test_feats]

Outputs (ignoring the warning from true_divide):

Training data: [([0, 0], 0), ([0, 1], 1), ([0, 2], 2), ([0, 3], 3), ([0, 4], 4), ([1, 0], 1), ([1, 1], 2), ([1, 2], 3), ([1, 3], 4), ([1, 4], 5), ([2, 0], 2), ([2, 1], 3), ([2, 2], 4), ([2, 3], 5), ([2, 4], 6), ([3, 0], 3), ([3, 1], 4), ([3, 2], 5), ([3, 3], 6), ([3, 4], 7), ([4, 0], 4), ([4, 1], 5), ([4, 2], 6), ([4, 3], 7), ([4, 4], 8)]

* Predictions after online training
1+1 = [8.]
1+2 = [8.]
3+4 = [8.]
2+2 = [8.]
* Predictions after batch training
1+1 = [2.6336]
1+2 = [2.92]
3+4 = [4.8752]
2+2 = [4.8752]

I've tried different values for epochs and the various parameters to RegressionTsetlinMachine, although I couldn't find an explanation of most of them in the docs, but the results are similar in all cases so far.

@Blimpyway

But, since the primary engine is by design a binary classifier it makes no sense to "train" it with one single point of data.

I think you're saying that tm.fit() won't work when called multiple times with individual examples, but I don't quite understand the reasoning you provided. Can you elaborate more? I've pointed out the issue with min_y and max_y above, are you saying this is a fundamental design decision with Tsetlin machines that can't be worked around?

Blimpyway commented 3 years ago

No what I meant was the first fit() needs more than one example.

And probably is a best bet to feed it a few more to cover uniformly in the first fit() the whole range of outputs you'll expect it to predict

Why I'm saying that - when using a classifier example there is no means to tell it how many classes your data has. But, after first fit with e.g. mnist data, it "knows" it has to predict one out of 10 classes .

Which means during first fit() it makes an analysis of X, y in order to initialize its inner state, and assumes that the first batch is what's representative for the data.

DestyNova commented 3 years ago

Thanks @Blimpyway, that makes sense. I guess dynamically adjusting to the output range is difficult to do on the fly and would require significant changes to how the regression machine works.

olegranmo commented 3 years ago

Hi @Blimpyway and @DestyNova ! Thanks for the great input for further development. I plan to update the code in conjunction with finishing the first chapters of the Tsetlin machine book I am writing. In the meantime, in the first fit you do for initialising the Tsetlin machine, you can set epochs=0. Then the necessary parameters will be extracted from the dataset, but no learning takes place. After that you can call fit on one example at a time.

olegranmo commented 3 years ago

BTW. There is an error in the fit code. min and max y is set every time fit is called, not just the first time as intended. Will fix this evening.

olegranmo commented 3 years ago

Fixed! Now calling fit with one example at a time should give the same result as batch, assuming that you first initialize max_y and min_y using the complete dataset with epochs=0.

DestyNova commented 3 years ago

Thanks @olegranmo -- confirmed that it behaves similarly to the single batch call to fit. :+1:

DestyNova commented 3 years ago

Closing as fixed.