ShiuLab / ML_workshop

Machine learning workshop materials
MIT License
14 stars 4 forks source link

Machine learning workshop

alt text

1. Workshop information

What this is about

More and more experimental data are available that have fueled ground breaking discoveries. Beyond the original intents of the experiments, these data can be used to discover even more. This is where machine learning (ML) comes in. We can use computers to learn from data and generate models that can predict a biological phenomenon of interest - e.g., will this gene be lethal when it is knocked-out, or which genetic variants can meaningfully predict a phenotype of interests. To learn more about ML, we created this workshop to provide an introduction on the following topics:

The workshop will include presentations, discussions, and hands-on sections for those who can complete the pre-workshop components without issue.

Who we are

What kinds of materials we are sharing

2. Instructions for runnning the notebook

What's needed

The workshop example is provided as a Jupyter notebook. It is a document generated by the Jupyter Lab or Jupyter Notebook applications. A notebook can contain both computer codes in popular languages such as Python and R, and texts in the form of paragraph, equations, figures, links, etc.

To follow what we have shown in the workshop, you need the following:

Get some backgrounds

Please make sure you:

  1. Have a look at the pre-workshop notebook through GitHub and take some notes on the questions asked.
  2. Watch this video, and this video on getting Jupyter notebook to run.

Get the notebook and data

You can download the notebooks and data from this repository, preferably by setting the following up:

If you don't have git and/or Github account, do the following:

  1. Create a GitHub Account
  2. Download and install Github Desktop
    • Or you can use Git if you are familar with version control and command-line interface. Note that the following info is for using Github Desktop.
  3. Clone the ML_workshop by following this instruction and the following screenshot.
    • Note: You can specify where the repository goes in your computer. We suggest leaving it as default and to remember where it is - we need it later.

alt text

  1. Navigate to the location where the cloned repository is and confirm that it is there.

Install Anaconda

Anaconda is a a free and open-source distribution of the programming languages Python and R and is a widely used platform for computational and data science applications.

  1. Download the Python 3.X version of Anaconda.

  2. Install Anaconda using the instructions.

  3. Open your terminal in Mac or PC

    • Note: For PC, you need to open the terminal by "Running as Administrator". If you are not familiar with this, see this post for more info.
  4. Issue the following command to make sure Anaconda installation is complete:

    conda list

    The above command allows you to see what software packages have been installed.

Install software packages

Conda is a package/environment management system. It deals with installing software packages in your computer. It also creates and manage virtual environments where each environment you have a specific set of software for a general category of tasks.

  1. Create an ml_workshop environment and activate it:

    conda create -n ml_workshop python
    conda activate ml_workshop
    • Note: When prompted with Proceed, type y.
  2. Install software packages and their dependencies:

    conda install jupyterlab ipykernel ipywidgets matplotlib pandas scikit-learn seaborn shap tqdm
    pip install imbalanced-learn

Open the notebooks

  1. Run Jupyter Lab
jupyter lab
  1. In the Jupyter lab window that opens, on the left panel, navigate to __ML_workshop__, the directory where the cloned Github repository is stored.

  2. Open ML_workshop-part_a-preparation.ipynb

  3. Run each code element by clicking SHIFT + ENTER.