broadinstitute / lincs-cell-painting

Processed Cell Painting Data for the LINCS Drug Repurposing Project
BSD 3-Clause "New" or "Revised" License
25 stars 13 forks source link

Create connectivities matrix #46

Closed shntnu closed 4 years ago

shntnu commented 4 years ago

Create 1571x1571 matrix of connectivities between compounds. Details tbd.

shntnu commented 4 years ago
gwaybio commented 4 years ago

Noting pycytominer.write_gct() here in case it is at all helpful

shntnu commented 4 years ago

We decided to drop the View Connectivities link in clue.io/morphology given the overhead required for creating that file. We will instead make the consensus/2016_04_01_a549_48hr_batch1/2016_04_01_a549_48hr_batch1_consensus_modz.csv.gz available as a GCT file that can be directly loaded in morpheus.

---------- From: Shantanu Singh Date: Sat, May 23, 2020 at 7:03 PM To: Jacob Asiedu Cc: Ted Natoli, IPLINCS, Gregory Way I realized we also need to update the link for the "View Connectivities" button on https://clue.io/morphology Currently, it points to s3://data.clue.io/cell-painting/introspect_aggregate_maxq_n1571x1571.gct 1. What would you need from us to update that? Just the URL alone?2. Looks like Morpheus now accepts tab-delimited text files https://clue.io/morpheus. Does that mean we can generate a TSV for the connectivities? -Shantanu query string, for our future reference: ``` {    "dataset": "//s3.amazonaws.com/data.clue.io/cell-painting/introspect_aggregate_maxq_n1571x1571.gct",    "columns": [       {          "field": "name",          "display": [             "text"          ]       }    ],    "rows": [       {          "field": "pert_iname",          "display": [             "text"          ]       },       {          "field": "moa",          "display": [             "text"          ]       }    ],    "rowSortBy": [       {          "field": "moa",          "order": 0,          "type": "annotation"       },       {          "field": "pert_iname",          "order": 0,          "type": "annotation"       }    ],    "columnSortBy": [       {          "field": "moa",          "order": 0,          "type": "annotation"       },       {          "field": "name",          "order": 0,          "type": "annotation"       }    ] } ``` ---------- From: Jacob Asiedu Date: Sat, May 23, 2020 at 7:11 PM To: Shantanu Singh Cc: Ted Natoli, IPLINCS, Gregory Way We would prefer a gct file. So I suggest we generate a new file and mark it as latest on the page and deprecate the old one. We could make the old one still available as reference. See https://clue.io/proteomics for an example of what I mean ---------- From: Shantanu Singh Date: Thu, May 28, 2020 at 8:59 AM To: Jacob Asiedu Cc: Ted Natoli, IPLINCS, Gregory Way Jacob – Will do. Going forward, we will version each release of the data (corresponding to any future updates we make to the data processing) via git tags + GitHub releases.  Ted – I was trying to look up code that we used to generate introspect_aggregate_maxq_n1571x1571.gct I think it is this: https://github.com/broadinstitute/2015_10_05_DrugRepurposing_AravindSubramanian_GolubLab_Broad/blob/090787c46b44c6fbb7960915b4ab77b9c31c2344/analysis_log.sh#L227-L239 Does that sound right to you? I have forgotten the cmapM API :) ---------- From: Shantanu Singh Date: Thu, May 28, 2020 at 11:06 AM To: Jacob Asiedu Cc: Ted Natoli, IPLINCS, Gregory Way Thinking about this again:  Could we deprecate the "View Connectivities" link and instead provide a link to load the Level 5 GCT file for the whole experiment? The user can always compute the similarities in Morpheus (and aggregate) as needed. This will save us a TON of effort because we'd need to run several steps of this code each time we update our data processing pipeline, and the only way to do that is set up things on the Broad cluster to run cmapM, something we are not intimately familiar with. Let me know if that works for you. Shantanu ---------- From: Ted Natoli Date: Thu, May 28, 2020 at 2:14 PM To: Shantanu Singh Cc: Jacob Asiedu, IPLINCS, Gregory Way Hi Shantanu, Yes that code for generating the maxq aggregated introspect matrix looks right to me. I also think it would be fine to deprecate the View Connectivities link and instead provide a link to load the entire level 5 matrix. The only minor caveat is that morpheus does not have the ability to compute percentile scores (aka tau values), so it will not be possible to reproduce the values that are currently in introspect_aggregate_maxq_n1571x1571.gct directly within morpheus. But this may be a worthwhile tradeoff to avoid having to recompute and aggregate connectivities whenever you reprocess the data. Best,Ted ---------- From: Shantanu Singh Date: Sat, May 30, 2020 at 1:09 AM To: Ted Natoli Cc: Jacob Asiedu, IPLINCS, Gregory Way Hi Ted and Jacob Thanks for accommodating that. We are figuring out some other details related to the consensus profiles that might take a while to sort out. For now, is it possible to go ahead and deprecate the connectivities link, and only have the Download data option available (mockup below)? In the next version, we will make more data available that can be easily explored and you can bring back the Explore section. If that works for you, then I think you have everything you need (i.e. the Manifest file) to make the data available via clue.io/morphology. Please LMK if that's not the case. BestShantanu ---------- From: Ted Natoli Date: Sat, May 30, 2020 at 8:41 AM To: Shantanu Singh Cc: Jacob Asiedu, IPLINCS, Gregory Way Hi Shantanu, That sounds fine to me. Jacob what do you think? Best,Ted ---------- From: Jacob Asiedu Date: Sat, May 30, 2020 at 8:48 AM To: Ted Natoli Cc: Shantanu Singh, IPLINCS, Gregory Way Sounds good to me. I will go ahead and implement the suggestions. ---------- From: Jacob Asiedu Date: Tue, Jun 2, 2020 at 12:32 PM To: Shantanu Singh Cc: IPLINCS, Gregory Way, Ted Natoli Hello Shantanu, The downloads should be all set now. In our next release, I will deprecate the connectivity link.Please take a look and let me know what you think. Thanks ---------- From: Shantanu Singh Date: Tue, Jun 2, 2020 at 1:25 PM To: Jacob Asiedu Cc: IPLINCS, Gregory Way, Ted Natoli Hi Jacob, Thanks for getting this done so quickly! This looks great for now. Once Greg and I have chatted about this, we will make some suggestions which you could consider for your next release BestShantanu