WayScience / mitocheck_data

All information regarding the download and processing of Mitocheck data from IDR study with accession idr0013 (screenA).
Creative Commons Zero v1.0 Universal
2 stars 5 forks source link

Training Feature Preprocessing Issues #11

Closed roshankern closed 2 years ago

roshankern commented 2 years ago

As described in Data-strategies for image-based cell profiling, preprocessing features usually involves plate-layout-effect correct. However, for the training data, we do not have the context of an entire plate. For this reason, batch and plate-layout effect correction become difficult. This is a good time to begin considering the download, preprocessing, segmenting, feature extraction data stream pipeline so we can demonstrate these corrections are not needed or begin applying them to training data.

roshankern commented 2 years ago

In 3.normalize_data we use UMAP to suggest that batch effects are not the dominant signal in mitosis movies. Rather, the gene knockdown perturbations are the dominant signal in feature data.

Because batch effects are likely not the dominant signal in the mitosis movies, we perform normalization across the entire screen. In other words, we create one normalization scaler from all negative control features and apply this normalization scaler to all mitosis movie feature data.