chris-lovejoy / CodingForMedicine

A series of educational exercises, applying programming to medicine
21 stars 13 forks source link

Coding For Medicine

A series of educational exercises, applying programming to medical problems.

About these exercises

Each exercise is designed to function as a standalone exercise.

If you are a learner, select an exercise below based on your interests and experience level. The quickest way to get started is click on the link below, then on 'Open in Colab'.

If you are an educator, feel free to adopt and adapt these exercises based on your requirements.

If you are interested in contributing, see the Contributor Guidelines.

Exercises

Exercise Difficulty Concepts Accompanying material Created by
Setting up Jupyter Notebook Introductory Jupyter Notebook, Google Colab, importing modules Official Tutorial for Google Colab Dr Chris Lovejoy
Python Principles (1, 2, 3, 4, 5) Beginner Variables, functions, loops, conditionals, data structures Dr Aaron Smith
Coding a medical calculator Beginner basic Python (input, try/except, if/else/while, print) YouTube tutorial Dr Chris Lovejoy
Predicting hospital non-attendance Intermediate cleaning data, feature engineering, simple classification model YouTube tutorial, blog post Dr Chris Lovejoy
Diagnosing breast cancer Intermediate model training, performance metrics, confusion matrix YouTube tutorial Dr Chris Lovejoy
Creating and querying an EHR database Intermediate SQL queries, pandas, Levehstein distance Dr Kelvin Kramp
Predicting stroke Intermediate dealing with class imbalance, F1 score, underfitting and overfitting, Dr Lawrence Adams
Predicting length of stay with logistic regression Intermediate logistic regression, odds and odds ratios, dummy variables, confidence intervals Dr Jess Caterson
Cancer gene expression classification Advanced exploratory data analysis, feature selection, classification models, prediction metrics Dr Emily Jin
Diagnosing chest X-rays Advanced image analysis, convolutional neural networks, transfer learning Oleksandr Teslenko
Extracting insights from Medical Research Papers Advanced NLP (tokenisation, summarisation, question-answering), APIs Dr Chris Lovejoy

Setup (if running locally) [Optional]

Clone the repo (use Terminal in Mac or Windows)

git clone https://github.com/chris-lovejoy/CodingForMedicine.git

Creating a virtual environment (use Terminal in Mac or Windows)

python3 -m venv <name_of_new_environment>

Activate the virtual environment (use Terminal in Mac or Windows)

source <name_of_new_environment>/bin/activate

Install Dependencies

If you are running the notebooks using Google Colab, the dependencies will be installed automatically when you run the notebook. If you are running the notebooks locally, you can install any package as follows:

pip install <package_name>

For instance,

pip install openai, pandas

Contributor Guidelines

If you would like to contribute an exercise to this repository, please either (A) submit an Issue or (B) submit a Pull Request of the modifed README, with your exercise added to the table.

The core principles are that all exercises should be:

1. Highly descriptive (but not overly verbose).

a. All code should be explained, with minimal amounts of assumed knowledge.

b. It should be easy to understand and complete the notebook with no reference to external material.

c. There should be a good integration between code and explanatory text. Sections of text shouldn't go beyond ~3-4 paragraphs without some code being run.

2. Interactive (ie. not "demo" notebooks)

a. Users should be prompted throughout to both (1) modify and complete code and (2) answer questions related to the exercise.

b. Code completions can be filling in gaps (example) or writing new functionality from scratch (example).

c. Each sub-section of the exercise should have interactive elements, such as code to complete, 2-3 questions to answer or both.

d. Questions should be a mixture of "open" and "closed" questions. "Explore the dataset further and describe your findings" is an example of an open question, while "How many entries are there in the dataset?" and "Which variable has the most missing values?" are examples of closed questions.

e. Detailed descriptions of several potential follow-on exercises should be provided at the end of each notebook. These exercises should be more open-ended and with a broader scope than exercises throughout the notebook.

3. Easy to run.

a. It should be easy and intuitive to run the notebooks both on Google Colab or on local Jupyter Notebooks.

b. Explanations and task descriptions should be unambiguous, such that the challenge lies in doing the exercise, not in interpreting it.

Here is a simple template notebook and here is an example of a well-designed exercise.

If there are significant amounts of code for the user to write, then template 'solution' code can be provided in the 'template_code' folder.