Kiyoshika / CppEZML

A work in progress C++ machine learning library designed to be very easy to use. Everything pretty much written from scratch.
0 stars 0 forks source link

LogisticRegression.h - Reimplement #5

Closed Kiyoshika closed 3 years ago

Kiyoshika commented 3 years ago

Reimplement logistic regression class. It's extremely similar to linear regression so this should take no time.

NOTE: Something I'm also planning in linear regression class, allow users to pass function pointers to define custom loss functions. Will probably have to overload the loss function to allow optional function pointer parameter.

Kiyoshika commented 3 years ago

Implemented. See example below comparing cross entropy to user-specified squared error loss.

#include <iostream>
#include <vector>
#include <math.h>
#include "data/DataSet.h"
#include "models/classification/LogisticRegression.h"

using namespace std;

double squared_error(double actual_y, double predicted_y) {
    return (actual_y - predicted_y) * (actual_y - predicted_y);
}

int main() {

    // generated data from sklearn make_classification
    // x, y = make_classification(n_samples = 5000, n_classes = 2, n_informative = 8, n_features = 20)
    DataSet xtrain, xtest, ytrain, ytest;
    xtrain.load("xtrain.csv");
    xtest.load("xtest.csv");
    ytrain.load("ytrain.csv");
    ytest.load("ytest.csv");

    Classifier *se_logistic = new LogisticRegression(true, 1000, 0.001, &squared_error);
    Classifier *def_logistic = new LogisticRegression(true);

    se_logistic->fit(xtrain.cast_data_double(), ytrain.cast_target_int());
    cout << "\n";
    def_logistic->fit(xtrain.cast_data_double(), ytrain.cast_target_int());

    vector<int> se_preds = se_logistic->predict(xtest.cast_data_double());
    vector<int> def_preds = def_logistic->predict(xtest.cast_data_double());

    cout << "\n";
    cout << "Cross Entropy (Default) F1 Score: " << se_logistic->get_f1_score(ytest.cast_target_int(), se_preds) << "\n";
    cout << "Squared Error F1 Score: " << def_logistic->get_f1_score(ytest.cast_target_int(), def_preds) << "\n";

    delete se_logistic, def_logistic;

}

OUTPUT FROM ABOVE:

Total loss at iteration #0: 581.497
Total loss at iteration #100: 475.495

Total loss at iteration #0: 2353.73
Total loss at iteration #100: 2188.07

Cross Entropy (Default) F1 Score: 0.795377
Squared Error F1 Score: 0.795409