cytomining / pycytominer

Python package for processing image-based profiling data
https://pycytominer.readthedocs.io
BSD 3-Clause "New" or "Revised" License
77 stars 35 forks source link

Gaussianize feature values with inverse normalize transform (INT) #155

Open gwaybio opened 3 years ago

gwaybio commented 3 years ago

We used this transformation in a paper in preparation, it would be great to add this functionality to pycytominer to enable all future users access!

cc @jatinarora-upmc @shntnu

jenna-tomkinson commented 1 year ago

@gwaybio

Do you have any psuedo-code or links to provide that could give some soil for this idea?

gwaybio commented 1 year ago

The paper is now here: https://www.biorxiv.org/content/10.1101/2023.01.09.522731v1.full

The relevant section pasted here:

Quantification of cellular morphology traits and their quality control

The segmentation of individual cells in the image into its cellular compartments (whole cell, cytoplasm and nuclei) and subsequently quantification of morphology traits for each cellular compartments was done using CellProfiler 3.1.853; pipelines are available at https://github.com/broadinstitute/imaging-platform-pipelines/tree/master/cellpainting_ipsc_20x_phenix_with_bf_bin1. Analysis of CRISPR experiments was done in CellProifler 4.2.4 with pipelines availalbe at https://github.com/broadinstitute/imaging-platform-pipelines/tree/master/cellpainting_ipsc_20x_phenix_with_bf_bin1_cp4[54](https://www.biorxiv.org/content/10.1101/2023.01.09.522731v1.full#ref-54). Subsequently, cells missing measurement for more than 5% of traits were removed. Morphology traits a priori known to be problematic, not measured across all cells or non-variable across cells were removed using Caret v6.0-86 package. QC-ed cells were then segregated in two groups based on the number of neighbors: isolated cells having no neighbors and colony cells having one or more neighbors. Individual morphology traits were then summarized to well level measurement by averaging them across all cells per well, resulting in a well by trait matrix. Following this, each morphology trait was gaussianized across all 7 plates using inverse normal transformation (INT) method.