MilesCranmer / PySR

High-Performance Symbolic Regression in Python and Julia
https://astroautomata.com/PySR
Apache License 2.0
2.08k stars 198 forks source link

[Feature]: Multi-Class Classification #640

Open SyedUmairHassanKazmi opened 4 weeks ago

SyedUmairHassanKazmi commented 4 weeks ago

Feature Request

How can we use this library to get an equation for multiclass classification ?

xylhal commented 3 weeks ago

I'm also looking into this. Is it possible to define cross entropy as a custom loss function?

MilesCranmer commented 3 weeks ago

Right now the labels have to be scalars, so only binary classification and regression is possible. But I have an intern working with me this summer on adding vector capabilities to PySR so in principle it should be eventually doable.

In principle there's nothing standing in the way of this as the backend allows vectors/tensors/whatever other type you want: https://github.com/SymbolicML/DynamicExpressions.jl?tab=readme-ov-file#tensors. Just need to get it all integrated.

MilesCranmer commented 3 weeks ago

Actually I guess you could get it working as a custom loss function: https://astroautomata.com/PySR/examples/#9-custom-objectives

Maybe you could split a single tree into multiple expressions. And each of those expressions could act as a single logit – then compute the multi-class classification on top!

xylhal commented 3 weeks ago

Right now the labels have to be scalars, so only binary classification and regression is possible. But I have an intern working with me this summer on adding vector capabilities to PySR so in principle it should be eventually doable.

In principle there's nothing standing in the way of this as the backend allows vectors/tensors/whatever other type you want: https://github.com/SymbolicML/DynamicExpressions.jl?tab=readme-ov-file#tensors. Just need to get it all integrated.

Interesting, what do you mean by vector capabilities and how will you envision the outputs to be different? What I had in mind was the output equation mimicking a classification process, where the labels are scalar integers. That probably counter intuitive to the definition of regression, but I'm wondering if generating an equation that ultimately computes an integer is possible? Maybe naively a bunch of indicator functions of each variable? If so, would just using a custom loss function help?

Actually I guess you could get it working as a custom loss function: https://astroautomata.com/PySR/examples/#9-custom-objectives

Maybe you could split a single tree into multiple expressions. And each of those expressions could act as a single logit – then compute the multi-class classification on top!

Still pretty new to this, will take a deeper dive into it.