broadinstitute / lincs-cell-painting

Processed Cell Painting Data for the LINCS Drug Repurposing Project
BSD 3-Clause "New" or "Revised" License
25 stars 13 forks source link

Create consensus spherized profiles #72

Closed shntnu closed 3 years ago

shntnu commented 3 years ago

Given that we create a single CSV file for spherized in this notebook, it will easiest to compute consensus in the same notebook.

The output should be stored at lincs-cell-painting/spherized_profiles/consensus and be named

i.e. median and modz consensus for each of the two Batch 1 files in this directory.

And same for Batch 2 (2017_12_05_Batch2)

michaelbornholdt commented 3 years ago

+1 on this. Mattias was very confused about this as well!

michaelbornholdt commented 3 years ago

Actually, I think they should go here: lincs-cell-painting/consensus

FloHu commented 3 years ago

So I assume this means that the files in lincs-cell-painting/consensus are not spherized. Then which normalization strategy was applied there? This is not clear from the respective notebook (consensus/build-consensus-signatures.ipynb). Also, do "plate normalization" and "batch normalization" refer to the same procedure (as I would think)?

gwaybio commented 3 years ago

Very glad to have you both digging into this repo to uncover what is clear and what is not.

So I assume this means that the files in lincs-cell-painting/consensus are not spherized. Then which normalization strategy was applied there?

Correct, the profiles here are not spherized. We generate consensus signatures from the traditional level 4a normalized profiles.

From build-consensus-signature.ipynb cell 5.

file_bases = {
    "whole_plate": {
        "input_file_suffix": "_normalized.csv.gz",
        "output_file_suffix": ".csv.gz",
    },
    "dmso": {
        "input_file_suffix": "_normalized_dmso.csv.gz",
        "output_file_suffix": "_dmso.csv.gz",
    },
}

We use these suffixes to load specific data levels.

Also, do "plate normalization" and "batch normalization" refer to the same procedure (as I would think)?

They typically don't mean the same thing, but I am not sure what context you're referring to. In that context, it's possible we weren't entirely accurate!

(plate normalization could be something like normalizing profiles only to DMSO controls per plate for a goal of aligning profiles across plates, while batch normalization might normalize multiple plates together across multiple batches for a goal of aligning profiles across batches)

FloHu commented 3 years ago

"Correct, the profiles here are not spherized. We generate consensus signatures from the traditional level 4a normalized profiles."

About the batches: I agree, they don't necessarily mean the same thing but then that means that there are two possible types of normalization and it is not clear which one is applied (again, talking from the level of someone going through the repository description without reading the actual pycytominer source code). Since there are always differences between plates it's the first thing I think about when reading about normalization.

gwaybio commented 3 years ago

gotcha. Thanks!

Spherizing is in fact just one normalization method, but it happens at a different level. Level 4a data (mad robustize normalization) comes from per-plate profiles. Spherized data come from all level 4a profiles.

@FloHu - can you see if our discussion in #73 improves clarity on this specific point? And if not, can you describe it in the issue so that we can make all changes at once.

Let's stay on track with this issue specifically being about creating consensus spherized profiles (which i agree is tightly related to #73 and can probably be fixed in the same PR!)

gwaybio commented 3 years ago

@michaelbornholdt @FloHu or @shntnu - is anyone working on this currently or partially in the past? I might need this for an analysis in https://github.com/broadinstitute/lincs-profiling-comparison

shntnu commented 3 years ago

I haven't worked on it

gwaybio commented 3 years ago

Completed in #76

michaelbornholdt commented 3 years ago

I also haven't worked on this. Can't say I know where they are now. I assume they are 'hidden' with lfs in the consensus folder?

gwaybio commented 3 years ago

here you go: https://github.com/broadinstitute/lincs-cell-painting/tree/e9737c3e4e4443eb03c2c278a145f12efe255756/spherized_profiles/consensus