cytomining / pycytominer

Python package for processing image-based profiling data
https://pycytominer.readthedocs.io
BSD 3-Clause "New" or "Revised" License
76 stars 34 forks source link

Image based QC prior to aggregate_profiles #215

Open kvshams opened 2 years ago

kvshams commented 2 years ago

Are there any QC procedure that could be done prior to aggregating well.

In my case any images that have less cell dense region would create an artifact as the cells become larger and larger. I want to avoid those images from the aggregation steps.

Or is there any way to get the entire db to be covert to one data frame including all features and metadata?. This would be more usable for the QC and exclude identified outliers from db and perform the downstream aggregation and analysis?.

gwaybio commented 2 years ago

Pycytominer doesn't perform any QC at the moment.

You might consider looking into bioprofiling.jl. IIRC they have some QC ability.

You can also look into this paper, which proposes some QC ideas (not yet implemented in pycytominer, see rohban-lab/Image-based-cell-profiling-enhancement-via-data-cleaning-methods#1), including one which may be helpful for adjusting for cell density.

Pycytominer does have functionality to acquire full db (SQLite) here: https://github.com/cytomining/pycytominer/blob/b4d32d39534c949ad5165f0b98b79537c2a7ca58/pycytominer/cyto_utils/cells.py#L25

kvshams commented 2 years ago

@gwaybio Thank you for point out to the insightful method. It is a naive request. How to create the data frame of single cell df after loading the db. sc = SingleCells('sqlite:///Data/database.sqlite') # this is by default get the strata=['Metadata_Plate', 'Metadata_Well'] ie, How I can create a data-frame contains raw single cell level data for the qc, from sqlite output created by the ingest (used ingest function to combine parallel processed data)

gwaybio commented 2 years ago

We have a function inside the SingleCells class to merge single cells. See https://pycytominer.readthedocs.io/en/latest/pycytominer.cyto_utils.html#pycytominer.cyto_utils.cells.SingleCells.merge_single_cells

However, we recently recognized some memory issues in this function (see #195), which we're working to solve by moving away from SQLite to parquet (#213).

We'd welcome any insights and experience you have with this method

gwaybio commented 2 years ago

@kvshams - I wanted to provide an update that the merge_single_cells() functionality is now working well. It now takes 15 minutes to merge whereas previously it was taking several hours.

This might help you to design methods for image QC prior to aggregating. Thanks!