WayScience / Benchmarking_NF1_data

Benchmarking data processing strategies for Cell Painting data of NF1 Schwann cells. See analysis repository (https://github.com/WayScience/NF1_SchwannCell_data_analysis) for information on how the data was interpreted.
Creative Commons Zero v1.0 Universal
6 stars 18 forks source link

Apply pycytominer-transform to real world CP-derived SQLite #16

Open gwaybio opened 2 years ago

gwaybio commented 2 years ago

@d33bs - consider applying pycytominer-transform to the existing SQLite data here to test your approach with real world data.

We would be particularly interested in comparing the results from pycytominer.cyto_utils.cells.SingleCells to the pycytominer-transform derived parquet file.

More details about the SingleCells processing is here: https://github.com/WayScience/NF1_SchwannCell_data/blob/5536d86330aba0f66c74e2bd44c6e82ed1c985f9/4_processing_features/extract_single_cell_features.ipynb

gwaybio commented 2 years ago

The CellProfiler pipeline that @jenna-tomkinson used to create the SQLite file is here: https://github.com/WayScience/NF1_SchwannCell_data/blob/main/CellProfiler_pipelines/Pipelines/NF1_analysis.cpproj (I believe, Jenna, please correct me if I'm wrong!)

jenna-tomkinson commented 2 years ago

@gwaybio @d33bs Yes it is! Please run the NF1_illum.ccproj first to get the correct images.

d33bs commented 2 years ago

Thank you @gwaybio and @jenna-tomkinson! I will look into using this and performing a notebook-based comparison.

d33bs commented 1 year ago

Hi @jenna-tomkinson and @gwaybio - it looks like the data may have changed since the creation of this issue. I can see there are many .sqlite data sources to choose from under CellProfiler_pipelines/Analysis_Output. To make sure I address this issue accurately to the intent, which one(s) would be best to use?

gwaybio commented 1 year ago

pycytominer-transform should be able to handle them all, but you can focus on NF1_data_allcp_plate1.sqlite and NF1_data_allcp_plate2.sqlite. Plate 1 is the simplest case, Plate 2 is larger and will be a better memory/speed benchmark.