UBC-MDS / DSCI_522_Breast_cancer_predictors

Decision tree analysis of breast cancer result metrics to deduce the strongest predictor of malignancy
0 stars 2 forks source link

Too many features are analysed #11

Open milicmil opened 5 years ago

milicmil commented 5 years ago

Hi Arzan,

I took a look at the paper from which the data set comes from. There is definitely redundancy amongst the features. There are 10 features that are collected on the cell sample.

Radius Perimeter Area Compactness Smoothness Concavity Concave points Symmetry Fractal dimensions Texture

Now for each of these on the sample a mean was taken, the Standard Error was calculated, and most extreme(worst) values were taken for each of these features.

The original authors suggested that the worst values in fact should be used as only a few malignant cells may be present in a sample but we are looking for the outliers.

We can cut down on the features to look at and only look at the 10 with the label "worst" to cut down on the redundancy.

Article_1993.pdf

We should repeat the analysis but only use the worst features.