This jupyter notebook is written for use with an R kernel.
The notebook details steps from locating publically-available RNAseq counts, abundance, and clinical data from TCGA through identification of differentially expressed genes with DESeq2 and the visualization of th results.
Content is provided in the form of Jupyter notebooks. If you need an introduction to Jupyter, you can see the official documents.
You can run these notebooks on a jupyterhub server - potentially one provided by your course - or on your own computer, appropriately configured Python, R, an RKernel, and appropriate R libraries (installed through the first notebook).
For new users, I highly recommend installing Jupyter through Anaconda following these instructions and installing the latest version of Python 3.x. note by default Windows does not add Anaconda or Jupyter to your PATH. You can change this during the installation by telling it to install to the PATH see step 6 of these Jupyter installation instructions here. If you do not add anaconda to your PATH you will need to open it using the app or change the PATH later.
For this particular assignment, you will also need R. You can download and find installation instructions here.
After R is installed, it needs to be configured for Jupyter to establish an RKernel. You can do this by installing an RKernel. Follow the instructions on the page and be sure to launch R from the command line instead of the app icon. Installation of the RKernel will fail if you launch from the app icon on Windows. note if anaconda is not in your path, R will not find it when trying to push RKernel. If you open the Anaconda prompt and run R from there, it should find Jupyter.
note For R projects, I highly recommend using RStudio and RMarkdown instead of Jupyter Notebooks, but that will not be required to complete this module, which was written in Jupyter for purpose of instruction.
Github and GitLab are two popular community sites based on the Git source-code control system. We're going to use Git to create your own local copy of these modules, and to store any changes. We'll tell you a bit about it here, but there's much more to learn - for more information on Git, see git-scm.com.
For now, go to either Github or GitLab and create an account. Remember your account name.
The Jupyter notebook exercises for this module are contained in a GitHub repository - a collection of related files managed using the Git source code control system. To do your work for these modules, you will need your own personal copy of this repository stored in your GitHub or GitLab account. Below, we give different descriptions for GitHub and GitLab.
To do this in GitHub will require three browser windows:
Once this is setup, we can go to work.
At this point, you should have the URL for your own personal copy of this repository. You will now need to clone it into your Jupyter environment.
jupyter notebook
from the command prompt). Click "New" on the top right, and select "Terminal". This will create a linux command-line terminal window in the browser. Alternatively, if you are using JupyterHub, press the "Terminal" button in the Launcher screen.git clone ...
replacing the ... with the URL of your repository. You will need to provide your GitHub or GitLab user name and password. Note if you receive an error, it might be because you do not have git installed. To install git follow the instructions here.If everything is appropriately installed and cloned you should be able to run the notebooks.
Finish reading the rest of this readme and then when ready click on the notebooks to open them and start. Be sure to choose the RKernel.
This assignment is broken into 4 different python notebooks. Do them in order. Notebooks 2-4 have optional steps at the beginning of each in case you do not continue directly from the previous notebook. The 4 notebooks are:
1. RNASeq_TCGA_Introduction_Download - In the first notebook, we will:
2. Data wrangling - Prepping raw clinical and transcript counts data to generate gene level data for DEseq2 using tximport - In this lesson, you will do all of the data wrangling necessary to run DESeq2 to answer our question. This includes:
3. Running Tximport and DESeq2 - In this lesson, you will perform the differential gene expression analysis. This includes:
4. Visualize and Annotate DESeq2 results - In this lesson, you will annotate, export, and visualize the results. This includes: