More and more experimental data are available that have fueled ground breaking discoveries. Beyond the original intents of the experiments, these data can be used to discover even more. This is where machine learning (ML) comes in. We can use computers to learn from data and generate models that can predict a biological phenomenon of interest - e.g., will this gene be lethal when it is knocked-out, or which genetic variants can meaningfully predict a phenotype of interests. To learn more about ML, we created this workshop to provide an introduction on the following topics:
The workshop will include presentations, discussions, and hands-on sections for those who can complete the pre-workshop components without issue.
The workshop example is provided as a Jupyter notebook. It is a document generated by the Jupyter Lab or Jupyter Notebook applications. A notebook can contain both computer codes in popular languages such as Python and R, and texts in the form of paragraph, equations, figures, links, etc.
To follow what we have shown in the workshop, you need the following:
Please make sure you:
You can download the notebooks and data from this repository, preferably by setting the following up:
git
, a version control software (i.e., a tool to keep track of updates to codes) widely used by folks writing software in any language.git
for version control and collaboration (i.e., many people can work on the same codes).If you don't have git and/or Github account, do the following:
Anaconda is a a free and open-source distribution of the programming languages Python and R and is a widely used platform for computational and data science applications.
Download the Python 3.X version of Anaconda.
Install Anaconda using the instructions.
Open your terminal in Mac or PC
Issue the following command to make sure Anaconda installation is complete:
conda list
The above command allows you to see what software packages have been installed.
Conda is a package/environment management system. It deals with installing software packages in your computer. It also creates and manage virtual environments where each environment you have a specific set of software for a general category of tasks.
Create an ml_workshop
environment and activate it:
conda create -n ml_workshop python
conda activate ml_workshop
Proceed
, type y
.Install software packages and their dependencies:
conda install jupyterlab ipykernel ipywidgets matplotlib pandas scikit-learn seaborn shap tqdm
pip install imbalanced-learn
jupyter lab
jupyter lab --notebook-dir=C:/
jupyter lab --notebook-dir=D:/
In the Jupyter lab window that opens, on the left panel, navigate to __ML_workshop__, the directory where the cloned Github repository is stored.
Open ML_workshop-part_a-preparation.ipynb
Run each code element by clicking SHIFT + ENTER
.