alexzwanenburg / familiar

Repository for the familiar R-package. Familiar implements an end-to-end pipeline for interpretable machine learning of tabular data.
European Union Public License 1.2
30 stars 3 forks source link

Mode performance seems not right #66

Closed jorismq closed 1 year ago

jorismq commented 1 year ago

I built a mode1 from mydata by "familiar".

Then I choose the features from the "model_features" part in the "\familiar_1\trained_models\glm_logistic\univariate_regression\model.RDS"

I run the code by myself mode2 <- glm(Outcome~feature1+feature2+feature3+feature4,family = binomial(),data=mydata) summary(mode2)

Then I got total different coefficients of these features and aic from the "\familiar_1\results\pooled_data\performance\performance_metric.csv"

Then I predictted with mode2, and plot an ROC curve. I got different auc result from the "performance_metric.csv" too

I dont konw whats going on

Help! Thanks a lot

alexzwanenburg commented 1 year ago

By default, familiar performs a power transformation and standardises numeric features. Therefore model coefficients may appear different. This may affect the AUC-ROC and ROC-curves as well.

If you run summon_familiar with transformation_method = "none" and normalisation_method = "none" the results should be similar.

jorismq commented 1 year ago

Thanks again

alexzwanenburg commented 1 year ago

Can this issue be closed?

jorismq commented 1 year ago

Good job