UBC-MDS / 522-Workflows-Group-414

MIT License
1 stars 3 forks source link

An Autism Spectrum Disorder Screening Machine Learning Analysis

522 Workflows: Group #414

Members: Tejas Phaterpekar, Matthew Connell, Thomas Pin

About

Here we trained several machine learning models from the sklearn package in the Python programming language in attempt to find a model that would better and more quickly diagnose autism in adults. We used survey data from the UCI Machine Learning Datasets repository.

The survey data we used consisted of ten questions in addition to other information about the respondent, such as age and country of residence. We looked into whether there were questions that could be disregarded from the survey without having a negative impact on the accuracy of diagnosis.

We eventually chose a Decision Tree Classifier as our model. However, even though we had decent results on training and validation data, our model was poor at predicting outcomes on our test data.

The dataset used in this analysis was obtained from the University of California Irvine Machine learning Repository, uploaded by Fadi Thabtah. Each row represents an individual who participated in the survey. The survey’s results, the app’s classification, and some background information about the indvidual was recorded.

Report

The final report can be found here

Usage

Docker

To replicate this analysis, clone this github repository, make sure you have Docker installed, use the command line to navigate to the root directory of the project directory and type in:

docker run --rm -v <PATH_ON_YOUR_COMPUTER>:/522-Workflows-Group-414 mattc514/522-workflows:v1 make -C /522-Workflows-Group-414 all

where <PATH_ON_YOUR_COMPUTER> is the absolute path to the project directory.

To reset the repo to a clean state, with no intermediate or result files, run the follow command in the terminal from the root directory of this project repository

docker run --rm -v <PATH_ON_YOUR_COMPUTER>:/522-Workflows-Group-414 mattc514/522-workflows:v1 make -C /522-Workflows-Group-414 clean

where <PATH_ON_YOUR_COMPUTER> is the absolute path to the project directory.

Make

Alternatively, if you do not wish to use Docker, you can replicate this analysis using Make. First, clone this github repository. Next, ensure you have the following dependencies installed:

R packages:

Python packages:

Then use the command line to navigate to the root directory of the project directory and type in:

make all 

Finally, to reset the repo to a clean state, with no intermediate or result files, run the follow command in the terminal from the root directory of this project repository:

make clean

Makefile2graph

Graphical representation of file structure:

Direct link

License

The Autism Spectrum Disorder Screening Machine Learning Analysis materials here are licensed under the Creative Commons Attribution 2.5 Canada License (CC BY 2.5 CA). If re-using/re-mixing please provide attribution and link to this webpage.

References:

  1. Thabtah, F. 2018. "An accessible and efficient autism screening method for behavioural data and predictive analyses". 25(4):1739-1755. Health Informatics Journal.https://doi.org/10.1177/1460458218796636

  2. Allison, C., Auyeung, B., and Baron-Cohen, S. 2012. "Toward brief 'Red Flags' for autism screening: The Short Autism Spectrum Quotient and the Short Quantitative Checklist for Autism in toddlers in 1,000 cases and 3,000 controls". 51(2):202-212. J Am Acad Child Adolesc Psychiatry.https://doi.org/10.1016/j.jaac.2011.11.003

  3. Autism Spectrum Quotient-10 Survey (AQ-10).

  4. UCI Dataset Respository.

  5. Dua, Dheeru, and Casey Graff. 2017. “UCI Machine Learning Repository.” University of California, Irvine, School of Information; Computer Sciences.http://archive.ics.uci.edu/ml/index.php.