A series of educational exercises, applying programming to medical problems.
Each exercise is designed to function as a standalone exercise.
If you are a learner, select an exercise below based on your interests and experience level. The quickest way to get started is click on the link below, then on 'Open in Colab'.
If you are an educator, feel free to adopt and adapt these exercises based on your requirements.
If you are interested in contributing, see the Contributor Guidelines.
Exercise | Difficulty | Concepts | Accompanying material | Created by |
---|---|---|---|---|
Setting up Jupyter Notebook | Introductory | Jupyter Notebook, Google Colab, importing modules | Official Tutorial for Google Colab | Dr Chris Lovejoy |
Python Principles (1, 2, 3, 4, 5) | Beginner | Variables, functions, loops, conditionals, data structures | Dr Aaron Smith | |
Coding a medical calculator | Beginner | basic Python (input, try/except, if/else/while, print) | YouTube tutorial | Dr Chris Lovejoy |
Predicting hospital non-attendance | Intermediate | cleaning data, feature engineering, simple classification model | YouTube tutorial, blog post | Dr Chris Lovejoy |
Diagnosing breast cancer | Intermediate | model training, performance metrics, confusion matrix | YouTube tutorial | Dr Chris Lovejoy |
Creating and querying an EHR database | Intermediate | SQL queries, pandas, Levehstein distance | Dr Kelvin Kramp | |
Predicting stroke | Intermediate | dealing with class imbalance, F1 score, underfitting and overfitting, | Dr Lawrence Adams | |
Predicting length of stay with logistic regression | Intermediate | logistic regression, odds and odds ratios, dummy variables, confidence intervals | Dr Jess Caterson | |
Cancer gene expression classification | Advanced | exploratory data analysis, feature selection, classification models, prediction metrics | Dr Emily Jin | |
Diagnosing chest X-rays | Advanced | image analysis, convolutional neural networks, transfer learning | Oleksandr Teslenko | |
Extracting insights from Medical Research Papers | Advanced | NLP (tokenisation, summarisation, question-answering), APIs | Dr Chris Lovejoy |
git clone https://github.com/chris-lovejoy/CodingForMedicine.git
python3 -m venv <name_of_new_environment>
source <name_of_new_environment>/bin/activate
If you are running the notebooks using Google Colab, the dependencies will be installed automatically when you run the notebook. If you are running the notebooks locally, you can install any package as follows:
pip install <package_name>
For instance,
pip install openai, pandas
If you would like to contribute an exercise to this repository, please either (A) submit an Issue or (B) submit a Pull Request of the modifed README, with your exercise added to the table.
The core principles are that all exercises should be:
a. All code should be explained, with minimal amounts of assumed knowledge.
b. It should be easy to understand and complete the notebook with no reference to external material.
c. There should be a good integration between code and explanatory text. Sections of text shouldn't go beyond ~3-4 paragraphs without some code being run.
a. Users should be prompted throughout to both (1) modify and complete code and (2) answer questions related to the exercise.
b. Code completions can be filling in gaps (example) or writing new functionality from scratch (example).
c. Each sub-section of the exercise should have interactive elements, such as code to complete, 2-3 questions to answer or both.
d. Questions should be a mixture of "open" and "closed" questions. "Explore the dataset further and describe your findings" is an example of an open question, while "How many entries are there in the dataset?" and "Which variable has the most missing values?" are examples of closed questions.
e. Detailed descriptions of several potential follow-on exercises should be provided at the end of each notebook. These exercises should be more open-ended and with a broader scope than exercises throughout the notebook.
a. It should be easy and intuitive to run the notebooks both on Google Colab or on local Jupyter Notebooks.
b. Explanations and task descriptions should be unambiguous, such that the challenge lies in doing the exercise, not in interpreting it.
Here is a simple template notebook and here is an example of a well-designed exercise.
If there are significant amounts of code for the user to write, then template 'solution' code can be provided in the 'template_code' folder.