Decision Tree Algorithm

ahmetcihatcetin / ADHD-adolescents-machine-learning

Using Machine Learning in ADHD for Children and Adolescents as a New and Sensitive Diagnostic Method

0 stars 0 forks source link

Utilized Modules/Libraries

scikit-learn for all of the machine learning needs of the project.

sklearn.tree.DecisionTreeClassifier for the decision tree model
sklearn.model_selection.train_test_split()
sklearn.metrics for performance metrics calculation
sklearn.tree.export_graphviz for visualising used the decision tree model

Pandas for convenient data parsing, manipulation and visualization.

six a compatibility library for Python 2 and Python 3

six.StringIO
- an alias for StringIO.StringIO in Python 2 and io.StringIO in Python 3

IPython.display.Image and pydotplus for visualising the used decision tree model

seaborn

statistical data visualization
seaborn.objects for plotting the ROC curve

Decision Tree Model (for the Project)

sklearn.tree.DecisionTreeClassifier is used as a model for training and testing the data.

max_depth for the decision tree model which will be used by sklearn.tree.DecisionTreeClassifier is 3 for both reasons:

Optimization: After having tested the accuracy of the predictions without any max depth we've concluded that by limiting the max depth to 3 the algorithm predicts with approximately 10% better accuracy.
Visualisation and Comprehensibility: For producing less complex, more understandable decision tree model.

criterion: Determines the function which will be used for evaluating the quality of the split. Gini criteria/index has been used in this decision tree model.

splitter is the strategy which is used for choosing one of the two branches for the current node, this procedure is called splitting. It's default value is best and it is the strategy used for this project by explicitly not indicated as a parameter.

min_samples_leaf:

The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it leaves at least min_samples_leaf training samples in each of the left and right branches.
We've determined the min_samples_leaf as 10 in order to get meaningful probabilities of the classes. By doing so, we'll be able to plot the ROC (receiver operating characteristic) curve for our model.

Below, we could see a visualization of the decision tree which is used by the algorithm for the conners parent data. We could take notice the relevant information such as:

gini index for the nodes

Number of samples on the node and its distributions to the child nodes

Current, temporary classification of the node and a color corresponding to the 'purity' of the classification (more darker the color more 'pure' the labeling).

Also we can confirm that the maximum depth of our decision tree model is indeed 3 and minimum samples on the leaves are indeed at least 10.

Visualization of the decision tree which is used by the algorithm for the conners parent data

Likewise, below is the decision tree for the Conners' teacher data:

Visualization of the decision tree which is used by the algorithm for the conners teacher data

The same observations which we've made for the decision tree for the parent data could be made for the the decision tree for the teacher data.

Performance Metrics

Our decision tree algorithm also creates a file containing the related performance metrics for the predictions made for the unlabeled data. Performance metrics are crucial for determining the 'success' of the algorithm and making further optimizations onto the algorithm. Let's have a look at these performance metrics:

`Accuracy`

Accuracy is one of the more simple performance metrics yet it is helpful for evaluating the machine learning models especially the classfication models for overal performance. It is simply the ratio of the correctly made predictions to the total number of predictions in the dataset (test data).

`Precision` and `Recall`

Precision and recall are essential evaluation metrics in machine learning for understanding the trade-off between false positives and false negatives.
- Precision is the ratio of true positive predictions to all positive predictions.
- proportion of positive prediction that was actually correct
- a measure of how accurate the positive predictions are
- Recall aka sensitivity or true positive rate is the ratio of positive predictions to all actual positive instances
- It measures the classifier's ability to identify positive instances correctly
In other words, precision and recall give us a glimpse of the true positive prediction performance of the algorithm with respect to the false positives (FP) and the false negatives (FN) respectively.
We as a developers should aim for the highest precision and recall values as possible (100% max). However, there is usually a trade-off between precision and recall.
Increasing precision means minimizing false positives (FP).

Increasing recall means minimizing false negatives (FN). PrecisionAndRecall

When to use Accuracy or Precision/Recall?

Accuracy is more appropriate	Precision/Recall is more appropriate
Dataset is balanced in terms of class distribution and the costs of FP and FN are (almost) equal	Class distribution is not balanced or the costs of FP and FN are quite different

`F1 Score`

F1 score is another metric which is useful for evaluating binary classification models in terms of predictions made for the positive class.
It is simply the harmonic mean of the precision and the recall scores of the model, which is could be viewed as balancing the both measures (precision and recall) into one measure.

f1score

`ROC (Receiver Operating Characteristic) Curve`

ROC curve is another useful performance metrics for binary classification models.
It measures the model's ability to distinguish between positive and negative classes (by quantifying it).
It is well suited to visualize the trade-off between the true positive rate (TPR) and the false positive rate (FPR).
A higher AU-ROC value, area under the curve, indicates a well-performing model on the dataset.
- Since a 'perfect' classifier should have an AU-ROC of value 1 and a random classifier may have the value of 0.5 approximately, the AU-ROC value for our model should be between 0.5 and 1, and the higher better.

ROCcurve reference for the graph

References: (Shah, 2023) and javatpoint.com

ahmetcihatcetin / ADHD-adolescents-machine-learning