cancer-estimator / model

Model Search Repository
0 stars 0 forks source link

Feature importance do dataset integrado #15

Closed ryukinix closed 6 months ago

ryukinix commented 7 months ago

Usar decision tree ou regressão logística, extrair as métricas e feature e importance

helen0l commented 6 months ago
Decision Tree Feature Importance: Logistic Regression Feature Importance:
SEVERITY 26% COUGHING 29%
OTHER_SYMPTOMS 8% CHEST_PAIN 12%
GENDER_FEMALE 7% FATIGUE 6%
CHEST_PAIN 7% RESPIRATORY_SYMPTOMNS 6%
SMOKING 6% SNORING 5%
RESPIRATORY_SYMPTOMNS 6% SMOKING 5%
SWALLOWING_DIFFICULTY 5% SWALLOWING_DIFFICULTY 4%
COUGHING 5% SEVERITY 3%
COLD_SYMPTOMNS 5% OTHER_SYMPTOMS 2%
GENDER_MALE 4% GENDER_FEMALE 1%
AGE_25_59 3% GENDER_MALE 1%
SNORING 3% COLD_SYMPTOMNS -14%
AGE60 3% SHORTNESS_OF_BREATH -21%
AGE_0_9 3% AGE_25_59 -44%
AGE_20_24 2% AGE60 -45%
AGE_10_19 2% AGE_0_9 -46%
FATIGUE 2% AGE_20_24 -48%
SHORTNESS_OF_BREATH 2% AGE_10_19 -48%
helen0l commented 6 months ago

Random Forest Test Accuracy: 0.9386218603627676 Random Forest Test ROC AUC Score: 0.5148739674606172

Confusion Matrix - Random Forest: [[59595 60] [ 3845 122]]

ryukinix commented 6 months ago

E o F1 score da classe target (1), @helen0l ? Com exceção da matriz de confusão, essas métricas são irrelevantes para esse problema com classes extremamente desbalanceadas.

helen0l commented 6 months ago

Random Forest Model Evaluation Metrics: Test Accuracy: 0.9386218603627676 ROC AUC Score: 0.5148739674606172 F1 Score: 0.058809351651000236 Confusion Matrix:

[[59595    60]
 [ 3845   122]]

Logistic Regression Model Evaluation Metrics: Test Accuracy: 0.9382603501933293 ROC AUC Score: 0.5062097946310351 F1 Score: 0.024826216484607744 Confusion Matrix:

[[59644    11]
 [ 3917    50]]
ryukinix commented 6 months ago

Obrigado por refazer as métricas! @helen0l