Coding For Medicine

A series of educational exercises, applying programming to medical problems.

About these exercises

Each exercise is designed to function as a standalone exercise.

If you are a learner, select an exercise below based on your interests and experience level. The quickest way to get started is click on the link below, then on 'Open in Colab'.

If you are an educator, feel free to adopt and adapt these exercises based on your requirements.

If you are interested in contributing, see the Contributor Guidelines.

Exercises

Exercise	Difficulty	Concepts	Accompanying material	Created by
Setting up Jupyter Notebook	Introductory	Jupyter Notebook, Google Colab, importing modules	Official Tutorial for Google Colab	Dr Chris Lovejoy
Python Principles (1, 2, 3, 4, 5)	Beginner	Variables, functions, loops, conditionals, data structures		Dr Aaron Smith
Coding a medical calculator	Beginner	basic Python (input, try/except, if/else/while, print)	YouTube tutorial	Dr Chris Lovejoy
Predicting hospital non-attendance	Intermediate	cleaning data, feature engineering, simple classification model	YouTube tutorial, blog post	Dr Chris Lovejoy
Diagnosing breast cancer	Intermediate	model training, performance metrics, confusion matrix	YouTube tutorial	Dr Chris Lovejoy
Creating and querying an EHR database	Intermediate	SQL queries, pandas, Levehstein distance		Dr Kelvin Kramp
Predicting stroke	Intermediate	dealing with class imbalance, F1 score, underfitting and overfitting,		Dr Lawrence Adams
Predicting length of stay with logistic regression	Intermediate	logistic regression, odds and odds ratios, dummy variables, confidence intervals		Dr Jess Caterson
Cancer gene expression classification	Advanced	exploratory data analysis, feature selection, classification models, prediction metrics		Dr Emily Jin
Diagnosing chest X-rays	Advanced	image analysis, convolutional neural networks, transfer learning		Oleksandr Teslenko
Extracting insights from Medical Research Papers	Advanced	NLP (tokenisation, summarisation, question-answering), APIs		Dr Chris Lovejoy

Setup (if running locally) [Optional]

Clone the repo (use Terminal in Mac or Windows)

git clone https://github.com/chris-lovejoy/CodingForMedicine.git

Creating a virtual environment (use Terminal in Mac or Windows)

python3 -m venv <name_of_new_environment>

Activate the virtual environment (use Terminal in Mac or Windows)

source <name_of_new_environment>/bin/activate

Install Dependencies

If you are running the notebooks using Google Colab, the dependencies will be installed automatically when you run the notebook. If you are running the notebooks locally, you can install any package as follows:

pip install <package_name>

For instance,

pip install openai, pandas

Contributor Guidelines

If you would like to contribute an exercise to this repository, please either (A) submit an Issue or (B) submit a Pull Request of the modifed README, with your exercise added to the table.

The core principles are that all exercises should be:

1. Highly descriptive (but not overly verbose).

a. All code should be explained, with minimal amounts of assumed knowledge.

b. It should be easy to understand and complete the notebook with no reference to external material.

c. There should be a good integration between code and explanatory text. Sections of text shouldn't go beyond ~3-4 paragraphs without some code being run.

2. Interactive (ie. not "demo" notebooks)

a. Users should be prompted throughout to both (1) modify and complete code and (2) answer questions related to the exercise.

b. Code completions can be filling in gaps (example) or writing new functionality from scratch (example).

c. Each sub-section of the exercise should have interactive elements, such as code to complete, 2-3 questions to answer or both.

d. Questions should be a mixture of "open" and "closed" questions. "Explore the dataset further and describe your findings" is an example of an open question, while "How many entries are there in the dataset?" and "Which variable has the most missing values?" are examples of closed questions.

e. Detailed descriptions of several potential follow-on exercises should be provided at the end of each notebook. These exercises should be more open-ended and with a broader scope than exercises throughout the notebook.

3. Easy to run.

a. It should be easy and intuitive to run the notebooks both on Google Colab or on local Jupyter Notebooks.

b. Explanations and task descriptions should be unambiguous, such that the challenge lies in doing the exercise, not in interpreting it.

Here is a simple template notebook and here is an example of a well-designed exercise.

If there are significant amounts of code for the user to write, then template 'solution' code can be provided in the 'template_code' folder.

chris-lovejoy / CodingForMedicine

readme