gitter-lab / pharmaco-image

MIT License
1 stars 0 forks source link

pharmaco-image

In this project, we aim to use Human U2OS cell images (GigaScience dataset) to predict a large number of compound activities against different protein targets.

Investigations and key findings:

  1. Batch effects exist in the cell image dataset. There are several methods to detect batch effects.
    1. Visualizing cell image features vs. experiment ID
    2. Interactive visualization tool to detect batch effects
    3. Plot feature correlation heatmap
  2. It is challenging to remove such batch effects.
    1. ComBat normalization
    2. Z-score normalization
  3. It is promising to use cell image data to predict compound assay activities.
    1. Use compound fingerprint feature as a baseline
    2. Experiment with random forest, logistic regression with features extracted from a pre-trained CNN
    3. End-to-end train a LeNet CNN

To learn more, please check out our Jupyter Notebooks below and Python scripts in ./scripts.

Notebook Description
image_processing.ipynb Visualize the raw images and their features
meta_data.ipynb Explore the meta data come with the image dataset, such as compound chemical annotations
feature_visualization.ipynb Visualize the single cell images, CNN extracted features, and clusterings on the extracted features
normalization.ipynb Experiment with batch normalization methods such as Combat and z-score normalization
explore_excape_db.ipynb Align U2OS image data with ExCAPE-DB assay data using chemical annotations
positive_control.ipynb Find compounds that have been tested on U2OS cell-line from the CCLE database.
assay_selection.ipynb Aggregate cell-level CellProfiler features to assay-level
assay_prediction.ipynb Predict assay activity using U2OS images with random forest and logistic regression models
simple_cnn.ipynb Predict assay activity using U2OS images by training a LeNet CNN model