broadinstitute / pooled-cell-painting-profiling-recipe

:woman_cook: Recipe repository for image-based profiling of Pooled Cell Painting experiments
BSD 3-Clause "New" or "Revised" License
6 stars 4 forks source link

Adding Cell Painting Utility Methods #7

Closed gwaybio closed 4 years ago

gwaybio commented 4 years ago

Here I add two methods which enable two functions:

  1. Loading CellProfiler output .csv files for any specified compartment and adding column name prefixes
  2. Merging compartment .csv files into a single dataframe given merge column information

Example Usage

Example Entry in Config YAML

---
core:
  compartments:
    - Cells
    - Nuclei
    - Cytoplasm
parent_cols:
    cells:
      - Parent_Nuclei
    cytoplasm:
      - Parent_Nuclei
      - Parent_Cells
    spots:
      - Parent_Cells
  id_cols:
    - ImageNumber
    - ObjectNumber
---
aggregate:
  merge_cols:
    image_column: ImageNumber
    linking_compartment: Cytoplasm
    linking_columns:
      cells: Metadata_Cytoplasm_Parent_Cells
      nuclei: Metadata_Cytoplasm_Parent_Nuclei

Example Usage in Processing


from config_utils import process_config_file
from paint_utils import (
    load_single_cell_compartment_csv,
    merge_single_cell_compartments
)

config_file = <EXAMPLE YAML ABOVE>
config = process_config_file(config_file)

core_args = config["core"]
aggregate_args = config["aggregate"]

id_cols = core_args["id_cols"]
compartments = core_args["compartments"]
merge_info = aggregate_args["merge_cols"]

compartment_csvs = {}
for compartment in compartments:
    try:
        metadata_cols = parent_col_info[compartment.lower()] + id_cols
    except KeyError:
        metadata_cols = id_cols
    compartment_csvs[compartment] = load_single_cell_compartment_csv(
        site_compartment_dir, compartment, metadata_cols
    )

sc_merged_df = merge_single_cell_compartments(compartment_csvs, merge_info, id_cols)
gwaybio commented 4 years ago

@ErinWeisbart - we should be able to repurpose this logic for 2.process-cells.py