Make sigmoid layer for binary classifiers

dscolby commented 4 months ago

Currently, classification is done by just using the sigmoid activation function, which is basically just regression. This could potentially lead to predicted probabilities being outside of [0, 1]. Instead, for classification we should use a normal ELM with ReLU or another activation to get raw predictions and apply the sigmoid to those outputs similar to the way we use a softmax layer for multiclass classification.

dscolby commented 4 months ago

Possible options for getting predicted "probabilities" are:

1. Do nothing
        Since the ELM is minimizing the MSE, most of the time this would keep the predictions in [0, 1] but there could be times 
        when the predictions fall outside this range, as in a linear probability model.

2. Apply the sigmoid function to the predictions
        This would always constrain the outputs to [0, 1]. However, this would often be problematic because the output from the 
        ELM is already in one class or another and applying the sigmoid function would change the predicted class. For example, if 
        the threshold is the usual 0.5 and the ELM output a prediction of 0.4, which minimizes the MSE, then applying the sigmoid 
        function would give an output of 0.598687660112452, which would be predicting a different class. 

3. Use a clipping function.
        This would ensure that all the predictions would be in [0, 1] and it would also not change predicted classes. However, it 
        would not technically output a probability, like the sigmoid function would. Also, we would have to choose some range like 
        [1e-5, 1-1e-5] since no observation will have a probability of exactly 0 or 1.

Overall, using a clipping function is probably the best option because it does not change the optimization problem that the ELM is solving or change the predicted classes, keeps the prediction in [0, 1], and is probably doesn't make much of a difference for predictions outside [0, 1]

dscolby commented 3 months ago

It actually doesn't make sense to have categorical treatments or outcomes, so we can get rid of them.

dscolby / CausalELM.jl

Make sigmoid layer for binary classifiers #36