broadinstitute / lincs-profiling-complementarity

Analyzing and comparing signal found in different profiling technologies
BSD 3-Clause "New" or "Revised" License
5 stars 5 forks source link

LINCS hit query #14

Open gwaybio opened 3 years ago

gwaybio commented 3 years ago

@michaelbornholdt presented an analysis on querying "hits" during profiling checkin today.

I'd love to be able to include this analysis in the LINCS complementarity paper. Michael estimated that his analysis would take ~2 hours.

Specification

Input data

Output

I think I need to understand the "hit" analysis better. Given a compound, you're asking if you match another replicate as the top hit? So essentially the output is a ranked list of matches per compound and whether or not they map to the same category?

If so, then can you output the following data frame?

target_compound compound_to_match rank same_replicate same_MOA
X Y 1 True True
X Z 2 False True
and so on... ... ... ... ...

Let's iterate on the final output specifications if my understanding above is limited in any way.

A couple preliminary figures and statistics would also be helpful.

Motivation

michaelbornholdt commented 3 years ago

Thanks for setting up this issue. I agree with the motivation and the authorship, if we include this.

I can definitely give you that output. But it will take a bit longer to build that df. You proposed df also has a lot more information then needed for the simple histogram. But overall its best because I will want to put this onto cyto eval anyways.

michaelbornholdt commented 3 years ago

We should also add the MOAs of both compounds to that df since that will allow for more MOA focussed plots instead of counting the compounds

gwaybio commented 3 years ago

Sounds good! I am aiming for a July 1 submission, so in order for us to include it, I will need it before then.

Two additional points that you might want to consider:

Thanks!

michaelbornholdt commented 3 years ago

Yea good points! Not very familiar with test-driven development but I could try it :)

michaelbornholdt commented 3 years ago

@gwaygenomics The data you linked up there is level3. I just require level5 data. I misspoke in the meeting!

Can you point me to which level5 I should be using

gwaybio commented 3 years ago

level 5 links: Cell Painting L1000

Thanks!

gwaybio commented 3 years ago

@michaelbornholdt produced the analysis here: https://github.com/broadinstitute/neural-profiling/issues/2#issuecomment-872433897

I will ingest the output files in this repo for visualization