broadinstitute / jump-profiling-recipe

Workflow for processing cpg0016-jump profiles
BSD 3-Clause "New" or "Revised" License
1 stars 4 forks source link

Create profiling recipe for DeepProfiler output #6

Open shntnu opened 5 months ago

shntnu commented 5 months ago

@Arkkienkeli has produced Cell Painting CNN v1 embeddings for all of JUMP. Currently these are at the single-cell level.

We now need a workflow that does everything downstream.

We will use this issue to discuss and plan and likely move this to a new repo.

@jccaicedo and @shntnu will create a Snakemake workflow similar to https://github.com/broadinstitute/jump-profiling-recipe, but will start at the single cell level (Level 2) not at the replicate level (Level 3)

image Source:https://github.com/cytomining/pycytominer/blob/main/media/pipeline.png

Here are the steps based on this notebook recommended by @Arkkienkeli

We will need input from

Also from


Location of data

image

Notes

jccaicedo commented 2 months ago

@viditagr and I have started with level-2 to level-3 aggregation and want to implement the output in the correct format. @shntnu can you point us to how the aggregated files are supposed to be structured, named and organized in the directories?

jccaicedo commented 2 months ago

What we are doing is to create a parallel directory called profiles and follow the specified directory structure with batches and plates. Inside we save a single file with the well-level aggregated data in parquet format.

shntnu commented 1 month ago

can you point us to how the aggregated files are supposed to be structured, named and organized in the directories?

I missed replying to this

Please see below

https://broadinstitute.github.io/cellpainting-gallery/data_structure.html#workspace-dl-folder-structure

shntnu commented 1 month ago

Update from Caicedo lab, which I am posting here for @jessica-ewald's visibility:

We are making progress with the aggregation, but we found a few corrupted parquet files. Nikita is helping fix this, and we will also take the opportunity to fix the typo in the Cell Painting directory names. After this is complete, the aggregated data will be available together with the fixed data.