Closed roshankern closed 2 years ago
In 3.normalize_data we use UMAP to suggest that batch effects are not the dominant signal in mitosis movies. Rather, the gene knockdown perturbations are the dominant signal in feature data.
Because batch effects are likely not the dominant signal in the mitosis movies, we perform normalization across the entire screen. In other words, we create one normalization scaler from all negative control features and apply this normalization scaler to all mitosis movie feature data.
As described in Data-strategies for image-based cell profiling, preprocessing features usually involves plate-layout-effect correct. However, for the training data, we do not have the context of an entire plate. For this reason, batch and plate-layout effect correction become difficult. This is a good time to begin considering the download, preprocessing, segmenting, feature extraction data stream pipeline so we can demonstrate these corrections are not needed or begin applying them to training data.