SanoScience / MISS

MISS: Multiclass Interpretable Scoring Systems - SDM24
MIT License
0 stars 0 forks source link

How to binarize dataset with continuous features? #1

Open sabithamanoj opened 3 days ago

sabithamanoj commented 3 days ago

I have run the example code for iris dataset and it runs perfect. But how did you create input iris_binary.csv. Could you please let me know how you binarized the dataset? I wanted to use MISS scoring system for my project and dataset contains features that are continuous and need to be binarized.

MiHu773 commented 3 days ago

Dear @sabithamanoj, Thank you for your interest in our work. Unfortunately, the necessity of "pre-binarization" is the limitation of IP-based methods for building scoring systems. There are multiple strategies that you can use for feature discretization and we followed the approach from the FD-RiskSLIM paper. In our experiments we tested multiple algorithms for binarization (quantiles, kmeans, mdlp etc.). Here is the example of MDLP discretizer which you can use.

sabithamanoj commented 2 days ago

Thank you very much for the information!