MilesCranmer / SymbolicRegression.jl

Distributed High-Performance Symbolic Regression in Julia
https://ai.damtp.cam.ac.uk/symbolicregression/
Apache License 2.0
636 stars 82 forks source link

Switching from Float to UInt8 ? #58

Closed dlubomski closed 2 years ago

dlubomski commented 2 years ago

Is it possible to switch from Float to UInt8 numbers ? I`m new in Julia.

I would like to have SymbolicRegression.jl work in discrete numbers and then be able to use binary operators.

MilesCranmer commented 2 years ago

While the genetic algorithm part should work fine, unfortunately the constant optimizer, which tries to approximate gradients, will not work with discrete spaces. It is best to simply convert your dataset and all the operators to real numbers. If it finds the correct equation with the real numbered extensions of everything, you are golden!

This looks like a nice way to convert operators: https://stackoverflow.com/a/46674398/2689923.

e.g., (in Julia)

NOT(a) = (1-a)
AND(a, b) = a * b
OR(a, b) = a + b - AND(a, b)
XOR(a, b) = AND(OR(a, b), NOT(AND(a, b)))

Then you pass NOT as a unary operator, and AND, OR, XOR as binary operators to the SymbolicRegression Options. I have no idea if this will work but sounds fun to try!

But it could be interesting to implement native support for discrete relations in the future.

Cheers, Miles

MilesCranmer commented 2 years ago

I got curious and I'm very happy to report that this actually works!

Here's some code for the Python frontend PySR (you mentioned you were new to Julia) to run this:

import numpy as np
from pysr import pysr

# True equation:
truth = lambda x: (x[0] or x[1]) and (x[2] or x[3])

# Generate random binary numbers:
X = 1.0 * (np.random.randn(100, 5) > 0.5)
y = np.array([truth(x.astype(np.bool)) for x in X]).astype(np.float32)

binary_operators = [
    "AND(x, y) = x * y",
    "OR(x, y) = x + y - x * y",
    "XOR(x, y) = AND(OR(x, y), NOT(AND(x, y)))",
]
unary_operators = [
    "NOT(x) = 1 - x",
]
equations = pysr(
    X,
    y,
    niterations=5,
    binary_operators=binary_operators,
    unary_operators=unary_operators,
)

print(equations)

The output is:

Complexity MSE Equation
1 0.17 x2
3 0.13 AND(x3, x1)
5 0.08 AND(OR(x0, x1), x3)
7 0.0 AND(OR(x0, x1), OR(x3, x2))
MilesCranmer commented 2 years ago

Closing this for now. Let me know if you have other questions.