broadinstitute / NeuroPainting

Pilot experiments for establishing a neuronal Cell Painting protocol
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

[NCP Progenitors 1] Profile 22q cohort progenitors (D4) #10

Closed shntnu closed 2 weeks ago

shntnu commented 4 years ago

Goal

Perform Cell Painting on neural progenitor cells to delineate morphological traits which separate patients and controls during early forebrain development

Experimental Design

Expected date for imaging: Done Dyes: Cell Painting dyes Cell type: Day 4 progenitors Plates: 1 x 384-well Plate layout: this will be identical to the layout used for the cmQTL project, consisting of 48 different lines segmented into 4-well blocks dispersed across the 384-well plate. Plating parameters: 15k cells/well, fixed 24hrs post-plating (identified in our pilot)

Proposed analysis:

  1. Ensure we can stratify sample/feature profiles based on
    1. isolated cells
    2. colony forming cells
  2. Identify particular features and organelles structures perturbed by the 22q11 deletion
  3. compare ‘differential’ features between iPSCs and NPCs to identify whether there are shared pathways perturbed across cell states.
  4. Using existing RNA-expression data to integrate imaging and molecular data

Metadata

shntnu commented 4 years ago

@mtegtmey please upload the images here /imaging/analysis/2019_05_28_Neuronal_Cell_Painting/NCP_PROGENITORS_1/images

mtegtmey commented 3 years ago

Images are uploaded!

shntnu commented 3 years ago

For my notes because I keep looking around for the new instructions to upload from login01:

cd /imaging/analysis/2019_05_28_Neuronal_Cell_Painting/NCP_PROGENITORS_1/images

mv "Matt T Cell Painting*" BR_NCP_PROGENITORS_1 # rename the image folder to a standard name

# now edit this line
#       <PlateID>Matt T Cell Painting LM 12012020</PlateID>
# to this
#      <PlateID>BR_NCP_PROGENITORS_1</PlateID>
emacs BR_NCP_PROGENITORS_1/Images/Index.idx.xml  

reuse UGER
ish -l h_vmem=4G -pe smp 4 # get a node
workon cellpntg2 # or whatever env in which you've installed awscli
aws configure # verify you're in the right account

aws s3 sync \
   /imaging/analysis/2019_05_28_Neuronal_Cell_Painting/NCP_PROGENITORS_1/images \
   s3://imaging-platform/projects/2019_05_28_Neuronal_Cell_Painting/NCP_PROGENITORS_1/images

Transfer is underway

shntnu commented 3 years ago

@pearlryder The images are ready for analysis They live on /imaging/analysis at

/imaging/analysis/2019_05_28_Neuronal_Cell_Painting/NCP_PROGENITORS_1/images 

and also on S3

s3://imaging-platform/projects/2019_05_28_Neuronal_Cell_Painting/NCP_PROGENITORS_1/images

Feel free to pull in from either location

I think a good starting point to analyze these neuronal progenitor cells (Day 4) would be the pipeline used to analyze stem cells. For stem cells (a.k.a. NCP_STEM_1), I had reused an existing pipeline as mentioned here https://github.com/broadinstitute/neuronal-cell-painting/issues/7#issuecomment-727283749

I can run DCP once the pipeline is configured if you prefer.

shntnu commented 3 years ago

@mtegtmey could you comment on the priority for this one? Would bumping it to the new year work?

mtegtmey commented 3 years ago

@shntnu it is high-priority, but bumping to the new year would be fine! For me, it would be ideal to try having profiles and 'feature differentials' (however you refer to them) by mid-Feb if that seems possible.

shntnu commented 3 years ago

Thanks @mtegtmey ! @pearlryder feel free to make a call on prioritizing based on this info

pearlryder commented 3 years ago

Thanks @mtegtmey! We're going to try to process this data before the end of this year, but it's great to know that we won't be holding you back too terribly if we need to wait until January. I'll keep you updated with our progress -- you can expect to hear from me by the end of next week. Cheers!

pearlryder commented 3 years ago

Hi @mtegtmey and team,

I wanted to update everyone that we did have time to process these images and extracted the data over the weekend. We'll start the process of analyzing the data when I return to work in the New Year.

I hope everyone has a very happy holiday!

mtegtmey commented 3 years ago

@pearlryder thank you so much for the update, and all your hard work getting to this point! Ralda and I so much appreciate the work all of you have done on this project, and cannot wait for all the exciting science we will get to do together over the coming years. It's a collaboration we value tremendously.

Have a wonderful holiday, 'see' you in the new year!

shntnu commented 3 years ago

@pearlryder you can stop at the collate step i.e. just before https://cytomining.github.io/profiling-handbook/create-profiles.html#annotate and I'll handle things downstream

pearlryder commented 3 years ago

Thanks @shntnu! I should have everything uploaded to AWS by the EOD tomorrow. I'll ping you here when it's ready.

pearlryder commented 3 years ago

@shntnu, the analysis files are now available at s3://imaging-platform/projects/2019_05_28_Neuronal_Cell_Painting/workspace/backend/NCP_PROGENITORS_1/BR_NCP_PROGENITORS_1/.

The per-site analysis files are available at s3://imaging-platform/projects/2019_05_28_Neuronal_Cell_Painting/workspace/analysis/NCP_PROGENITORS_1/

In double checking the .csv file, I noticed that 4 wells are missing data: F11, F12, O18, and P18. I looked at several images for these wells and confirmed that the wells appear to be empty / contain debris only. Please let me know if you have any questions!

shntnu commented 3 years ago

Awesome! Thanks @pearlryder

I noticed that the SQLite file is 100Gb (BR_NCP_STEM_1 was 25Gb). Was the cell density high?

pearlryder commented 3 years ago

Yes @shntnu, most of the wells I examined were confluent. I just checked a few images from NCP_STEM_1 and they are indeed much lower density than the BR_NCP_PROGENITORS images (maybe ~ 50-75% confluency).

mtegtmey commented 3 years ago

@shntnu This is something we should expect. The conditions for the NPCs are 15k cells per well with a 24hr incubation period (so they may proliferate) compared to 10k cells with a 6hr incubation for the stem cells.

shntnu commented 3 years ago

Thanks @mtegtmey @pearlryder for clarifying!

shntnu commented 3 years ago

@mtegtmey To get his off the ground – is there any specific advantage in starting with an analysis of the 4 branching metrics alone? Or would you rather just have the entire profile (4000+) features.

mtegtmey commented 3 years ago

@shantanu Ideally all 4000+ features

On Thu, Jun 24, 2021 at 11:55 AM Shantanu Singh @.***> wrote:

@mtegtmey https://github.com/mtegtmey To get his off the ground – is there any specific advantage in starting with an analysis of the 4 branching metrics alone? Or would you rather just have the entire profile (4000+) features.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/broadinstitute/neuronal-cell-painting/issues/10#issuecomment-867755069, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMSE5ES3MSFCYVJH5LO5AZ3TUNIPDANCNFSM4UMKM2GA .

shntnu commented 3 years ago

Sounds good

PS – you are tagging the wrong Shantanu :D This is a private repo so we are good. I'm @shntnu

ruifanp commented 3 years ago

I am looking at the D4 data. It seems right now that the inter-human variation is greater than the difference between controls and deletion in the progenitors.

image

If subjects 5,6 and 33 are removed: image

However, there are still features which distinguish deletions from controls. There were 122 features which were statistically significant in control vs deletion in both stem cells and progenitors (out of 300+ features effective). Of those about 75% went in the same direction. I'll do some supervised methods and linear models to see if we can reliably distinguish controls from deletions despite the inter-human variation.

<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

shntnu commented 3 years ago

@ruifanp We wanted to see sample images for this plate

Please follow the steps here to do so

https://github.com/cytomining/cytoplot/issues/8#issuecomment-619273730

Ping me when you are stuck because I bet there are missing pieces of info

shntnu commented 3 years ago

Oh, you will first need to download the images of course

  1. Follow steps here to set up the R environment
  2. Run this notebook on your system to download the sample images for the project. You may choose to edit this line to include only the dataset you care about right now (NCP_PROGENITORS_1):
datasets <- 
  tribble(
    ~batch, ~plate,
    "NCP_PROGENITORS_1", "BR_NCP_PROGENITORS_1"
  )

Note that you will need to run these lines on the command line to download the images: https://github.com/broadinstitute/neuronal-cell-painting/blob/4ebc15074bb05a7e7ea09fe2a041d1a368d0a8a4/1.run-workflows/3.select_images_to_print.Rmd#L142-L158

  1. Create sample images for BR_NCP_PROGENITORS_1. Please follow the steps here to do so https://github.com/cytomining/cytoplot/issues/8#issuecomment-619273730
shntnu commented 3 years ago

@ruifanp can you please have this https://github.com/broadinstitute/neuronal-cell-painting/issues/10#issuecomment-886546515 squared away this week and tag @mtegtmey when you're done?

mtegtmey commented 3 years ago

@ruifanp @shntnu any updates to this? I want to push a repeat experiment ASAP if necessary.

shntnu commented 3 years ago

This was waiting on me over the last few days; sorry!

We needed to restore files that had gotten archived.

I am doing so now using this script https://github.com/broadinstitute/imaging-backup-scripts/blob/master/restore_intelligent.py

Will report back and then cc Beth to let her know it worked.

/usr/local/opt/python@3.9/bin/python3.9 \
  restore_intelligent.py \
  imaging-platform \
  projects/2019_05_28_Neuronal_Cell_Painting/NCP_PROGENITORS_1

Output (snapshot after 1 minute)

20750 total files found pre-filtering
20750 total files remain post-filtering
Could not restore object projects/2019_05_28_Neuronal_Cell_Painting/NCP_PROGENITORS_1/illum/
Could not restore object projects/2019_05_28_Neuronal_Cell_Painting/NCP_PROGENITORS_1/illum/BR_NCP_PROGENITORS_1/
Could not restore object projects/2019_05_28_Neuronal_Cell_Painting/NCP_PROGENITORS_1/illum/BR_NCP_PROGENITORS_1/cp.is.done
Could not restore object projects/2019_05_28_Neuronal_Cell_Painting/NCP_PROGENITORS_1/images/BR_NCP_PROGENITORS_1/FFC_Profile/FFC_Profile_Measurement 2.xml
Sent 100 restore requests
...

The next day

...
Sent 20200 restore requests
Sent 20300 restore requests
Sent 20400 restore requests
Sent 20500 restore requests
Sent 20600 restore requests
Sent 20700 restore requests
Could not restore object projects/2019_05_28_Neuronal_Cell_Painting/NCP_PROGENITORS_1/images/images
Sent all restore requests

These requests worked but several of the outline images did not

/usr/local/opt/python@3.9/bin/python3.9   \
  restore_intelligent.py   \
  imaging-platform \ 
  projects/2019_05_28_Neuronal_Cell_Painting/workspace/analysis/NCP_PROGENITORS_1/BR_NCP_PROGENITORS_1/analysis/ \
   --filter_in "png"   --filter_out "csv" 
Could not restore object projects/2019_05_28_Neuronal_Cell_Painting/workspace/analysis/NCP_PROGENITORS_1/BR_NCP_PROGENITORS_1/analysis/BR_NCP_PROGENITORS_1-H15-5/outlines/H15_s5--nuclei_outlines.png
Could not restore object projects/2019_05_28_Neuronal_Cell_Painting/workspace/analysis/NCP_PROGENITORS_1/BR_NCP_PROGENITORS_1/analysis/BR_NCP_PROGENITORS_1-L01-4/outlines/L01_s4--cell_outlines.png
Could not restore object projects/2019_05_28_Neuronal_Cell_Painting/workspace/analysis/NCP_PROGENITORS_1/BR_NCP_PROGENITORS_1/analysis/BR_NCP_PROGENITORS_1-E04-6/outlines/E04_s6--cell_outlines.png
Could not restore object projects/2019_05_28_Neuronal_Cell_Painting/workspace/analysis/NCP_PROGENITORS_1/BR_NCP_PROGENITORS_1/analysis/BR_NCP_PROGENITORS_1-K01-8/outlines/K01_s8--cell_outlines.png
Could not restore object projects/2019_05_28_Neuronal_Cell_Painting/workspace/analysis/NCP_PROGENITORS_1/BR_NCP_PROGENITORS_1/analysis/BR_NCP_PROGENITORS_1-O06-8/outlines/O06_s8--nuclei_outlines.png
Could not restore object projects/2019_05_28_Neuronal_Cell_Painting/workspace/analysis/NCP_PROGENITORS_1/BR_NCP_PROGENITORS_1/analysis/BR_NCP_PROGENITORS_1-E03-1/outlines/E03_s1--cell_outlines.png
Could not restore object projects/2019_05_28_Neuronal_Cell_Painting/workspace/analysis/NCP_PROGENITORS_1/BR_NCP_PROGENITORS_1/analysis/BR_NCP_PROGENITORS_1-L05-8/outlines/L05_s8--nuclei_outlines.png
Could not restore object projects/2019_05_28_Neuronal_Cell_Painting/workspace/analysis/NCP_PROGENITORS_1/BR_NCP_PROGENITORS_1/analysis/BR_NCP_PROGENITORS_1-G18-7/outlines/G18_s7--nuclei_outlines.png
Could not restore object projects/2019_05_28_Neuronal_Cell_Painting/workspace/analysis/NCP_PROGENITORS_1/BR_NCP_PROGENITORS_1/analysis/BR_NCP_PROGENITORS_1-F08-1/outlines/F08_s1--cell_outlines.png
Could not restore object projects/2019_05_28_Neuronal_Cell_Painting/workspace/analysis/NCP_PROGENITORS_1/BR_NCP_PROGENITORS_1/analysis/BR_NCP_PROGENITORS_1-C22-5/outlines/C22_s5--nuclei_outlines.png
Could not restore object projects/2019_05_28_Neuronal_Cell_Painting/workspace/analysis/NCP_PROGENITORS_1/BR_NCP_PROGENITORS_1/analysis/BR_NCP_PROGENITORS_1-F11-1/outlines/F11_s1--nuclei_outlines.png
Could not restore object projects/2019_05_28_Neuronal_Cell_Painting/workspace/analysis/NCP_PROGENITORS_1/BR_NCP_PROGENITORS_1/analysis/BR_NCP_PROGENITORS_1-P18-9/outlines/P18_s9--cell_outlines.png
Could not restore object projects/2019_05_28_Neuronal_Cell_Painting/workspace/analysis/NCP_PROGENITORS_1/BR_NCP_PROGENITORS_1/analysis/BR_NCP_PROGENITORS_1-M07-9/outlines/M07_s9--nuclei_outlines.png
Could not restore object projects/2019_05_28_Neuronal_Cell_Painting/workspace/analysis/NCP_PROGENITORS_1/BR_NCP_PROGENITORS_1/analysis/BR_NCP_PROGENITORS_1-G02-6/outlines/G02_s6--nuclei_outlines.png
Could not restore object projects/2019_05_28_Neuronal_Cell_Painting/workspace/analysis/NCP_PROGENITORS_1/BR_NCP_PROGENITORS_1/analysis/BR_NCP_PROGENITORS_1-P02-2/outlines/P02_s2--nuclei_outlines.png
Could not restore object projects/2019_05_28_Neuronal_Cell_Painting/workspace/analysis/NCP_PROGENITORS_1/BR_NCP_PROGENITORS_1/analysis/BR_NCP_PROGENITORS_1-H01-2/outlines/H01_s2--cell_outlines.png
Could not restore object projects/2019_05_28_Neuronal_Cell_Painting/workspace/analysis/NCP_PROGENITORS_1/BR_NCP_PROGENITORS_1/analysis/BR_NCP_PROGENITORS_1-F08-6/outlines/F08_s6--nuclei_outlines.png
Could not restore object projects/2019_05_28_Neuronal_Cell_Painting/workspace/analysis/NCP_PROGENITORS_1/BR_NCP_PROGENITORS_1/analysis/BR_NCP_PROGENITORS_1-G01-1/outlines/G01_s1--nuclei_outlines.png

Channel maps


      <Entry ChannelID="1">
        <FlatfieldProfile>{Background: {Character: NonFlat, Mean: 276.82462, NoiseConst: 4.4794638, NonFlatness: {Corrected: 0.061573386, Original: 0.45784867, Random: 0.026914451}, Profile: {Coefficients: [[1.1537], [-0.1952, -0.0629], [-0.786, 0.0164, -0.9893], [0.4016, 0.5166, 0.2448, 0.0266], [-0.9321, 0.2253, 1.0641, -0.2648, -0.1231]], Dims: [2160, 2160], Origin: [1079.5, 1079.5], Scale: [0.00046296296, 0.00046296296], Type: Polynomial}, Quality: 1.0}, Channel: 1, ChannelName: Alexa 647, Foreground: {Character: NonFlat, NonFlatness: {Original: 0.61926347, Random: 0.038961733}, Profile: {Coefficients: [[1.2289], [-0.122, -0.2808], [-0.9349, 0.1698, -1.8379], [0.5493, 0.9854, 0.1863, 0.4485], [-1.6299, 0.0643, 1.7424, -0.8028, 0.8358]], Dims: [2160, 2160], Origin: [1079.5, 1079.5], Scale: [0.00046296296, 0.00046296296], Type: Polynomial}, Quality: 1.0}, Version: Acapella:2013}</FlatfieldProfile>
      </Entry>
      <Entry ChannelID="2">
        <FlatfieldProfile>{Background: {Character: NonFlat, Mean: 219.39228, NoiseConst: 2.0131553, NonFlatness: {Corrected: 0.032956451, Original: 0.36471289, Random: 0.027722657}, Profile: {Coefficients: [[1.1399], [-0.0477, -0.058], [-0.8334, 0.131, -0.8225], [0.0643, 0.3933, 0.061, -0.0106], [-0.6338, -0.4985, 1.387, -0.1587, -0.2868]], Dims: [2160, 2160], Origin: [1079.5, 1079.5], Scale: [0.00046296296, 0.00046296296], Type: Polynomial}, Quality: 1.0}, Channel: 2, ChannelName: Alexa 568, Foreground: {Character: NonFlat, NonFlatness: {Original: 0.64135826, Random: 0.03817036}, Profile: {Coefficients: [[1.2321], [-0.271, -0.2332], [-0.936, 0.3731, -1.8427], [0.4359, 0.9875, 0.4462, 0.2329], [-1.9193, -1.1965, 1.9475, -0.8998, 0.7939]], Dims: [2160, 2160], Origin: [1079.5, 1079.5], Scale: [0.00046296296, 0.00046296296], Type: Polynomial}, Quality: 1.0}, Version: Acapella:2013}</FlatfieldProfile>
      </Entry>
      <Entry ChannelID="3">
        <FlatfieldProfile>{Background: {Character: NonFlat, Mean: 624.45183, NoiseConst: 1.3, NonFlatness: {Corrected: 0.031488486, Original: 0.60247177, Random: 0.017767787}, Profile: {Coefficients: [[1.242], [-0.0973, -0.0543], [-1.4894, 0.1219, -1.5797], [0.0735, 0.1321, 0.319, -0.0946], [-0.2274, -0.3055, 2.4095, -0.2666, -0.0135]], Dims: [2160, 2160], Origin: [1079.5, 1079.5], Scale: [0.00046296296, 0.00046296296], Type: Polynomial}, Quality: 1.0}, Channel: 3, ChannelName: 488 long, Foreground: {Character: NonFlat, NonFlatness: {Original: 0.70948625, Random: 0.038870561}, Profile: {Coefficients: [[1.2971], [-0.1707, -0.0873], [-1.9097, 0.314, -2.1103], [0.3832, 0.1762, 0.4525, 0.0455], [0.6746, -1.0848, 2.6936, -0.5221, 0.8574]], Dims: [2160, 2160], Origin: [1079.5, 1079.5], Scale: [0.00046296296, 0.00046296296], Type: Polynomial}, Quality: 1.0}, Version: Acapella:2013}</FlatfieldProfile>
      </Entry>
      <Entry ChannelID="4">
        <FlatfieldProfile>{Background: {Character: NonFlat, Mean: 805.56198, NoiseConst: 1.3, NonFlatness: {Corrected: 0.033651829, Original: 0.5953663, Random: 0.024583519}, Profile: {Coefficients: [[1.2396], [-0.0898, -0.0492], [-1.5413, 0.1165, -1.4906], [0.1098, 0.1265, 0.2704, 0.0067], [0.2785, -0.0943, 1.7308, -0.4344, -0.1939]], Dims: [2160, 2160], Origin: [1079.5, 1079.5], Scale: [0.00046296296, 0.00046296296], Type: Polynomial}, Quality: 1.0}, Channel: 4, ChannelName: Alexa 488, Foreground: {Character: NonFlat, NonFlatness: {Original: 0.77742112, Random: 0.038704636}, Profile: {Coefficients: [[1.3157], [-0.1696, -0.073], [-1.9486, 0.2803, -1.9417], [0.4557, 0.1292, 0.3916, 0.1484], [-0.0643, -0.964, 3.0171, -0.4453, -0.9293]], Dims: [2160, 2160], Origin: [1079.5, 1079.5], Scale: [0.00046296296, 0.00046296296], Type: Polynomial}, Quality: 1.0}, Version: Acapella:2013}</FlatfieldProfile>
      </Entry>
      <Entry ChannelID="5">
        <FlatfieldProfile>{Background: {Character: NonFlat, Mean: 619.83839, NoiseConst: 2.3488395, NonFlatness: {Corrected: 0.044228606, Original: 0.69694823, Random: 0.021729838}, Profile: {Coefficients: [[1.2188], [-0.1118, 0.0117], [-0.8487, 0.0394, -1.0215], [-0.0525, 0.1703, 0.2001, -0.2294], [-2.9741, -0.5146, 0.6968, 0.1254, -2.4524]], Dims: [2160, 2160], Origin: [1079.5, 1079.5], Scale: [0.00046296296, 0.00046296296], Type: Polynomial}, Quality: 1.0}, Channel: 5, ChannelName: HOECHST 33342, Foreground: {Character: NonFlat, NonFlatness: {Original: 0.78268158, Random: 0.038722602}, Profile: {Coefficients: [[1.2369], [-0.2093, -0.0045], [-0.8884, 0.081, -1.197], [0.1479, 0.2561, 0.4074, -0.2115], [-2.5074, -0.5371, -0.6543, -0.0151, -2.1817]], Dims: [2160, 2160], Origin: [1079.5, 1079.5], Scale: [0.00046296296, 0.00046296296], Type: Polynomial}, Quality: 1.0}, Version: Acapella:2013}</FlatfieldProfile>
      </Entry>
      <Entry ChannelID="6">
        <FlatfieldProfile>{Background: {Character: Null, Profile: {Type: Identity}, Quality: 1}, Channel: 6, ChannelName: Brightfield CP, Foreground: {Character: Flat, Profile: {Type: Identity}, Quality: 1}, Version: Acapella:2013}</FlatfieldProfile>
      </Entry>
ruifanp commented 3 years ago

20210804_progenitors_montage

This is the montage for the 48 human and/or samples images. There are several samples which seem to be blank or have very little cell count.

shntnu commented 3 years ago

Here is the outline for r04c05...f03

(Need to retrieve)

https://imaging-platform.s3.us-east-1.amazonaws.com/projects/2019_05_28_Neuronal_Cell_Painting/workspace/analysis/NCP_PROGENITORS_1/BR_NCP_PROGENITORS_1/analysis/BR_NCP_PROGENITORS_1-D05-3/outlines/D05_s3--nuclei_outlines.png

ruifanp commented 3 years ago

20210804_progenitors_dapi_montage

Montage for the DAPI channel only, with labels for the row and column it came from. Upon inspecting the cell counts, the images which are mostly blank correspond to the wells which have nonsensically low cell counts.

image Cell counts by well

shntnu commented 3 years ago

@ruifanp will plot cell counts similar to https://github.com/broadinstitute/neuronal-cell-painting/issues/7#issuecomment-727981591 and we may choose to exclude well that are more than 1 s.d. (across all the 384 wells)

shntnu commented 3 years ago

@mtegtmey @raldanehme:

@ruifanp and I met today; here's the summary

Our primary concern right now is that although we see a very good separation between case and control – reported using both t-tests and logistic regression – we noticed that cell count is a major driver of the difference.

If cell count is definitely a driver, then we should consider repeating this experiment, but we want to be doubly sure.

Here's @ruifanp 's plan

  1. Remove wells that have low cell count (the lower mode in the histogram plotted in https://github.com/broadinstitute/neuronal-cell-painting/issues/10#issuecomment-892873102; so drop anything with x-axis value < 200)
  2. Repeat the logistic regression analysis with the cell count controlled dataset (but making sure that his training-test data split is correct; currently, an individual can be in both train and test) and report the classification accuracy. We expect this to drop a bit.
  3. Now repeat the analysis after regressing out cell count, and report the classification accuracy again. If this results in a huge drop then it's clear that cell count is a major driver and we can't really trust the results from the current dataset, and should thus repeat the experiment hopefully limiting the variation of cell count.

After penning this down, I wondered whether at all it is practical to expect that cell count can be controlled any further @mtegtmey? We still don't know if cell count is the driver, so we may not need to repeat it after all. But will be good to know what's possible.

ruifanp commented 3 years ago

I redid logistic regression doing the split based on patient number rather than a purely random train test split. Here's the results.

image

Note that the scores tend to be pretty unstable depending on the random state used for the splitting, so I ran with several random states and chose a representative score. It appears that limiting results to ones with over 200 cell count only improves the quality of the data and leads to better separation of the classes.

shntnu commented 3 years ago

This is encouraging! (but IIRC you said you expected the 200 threshold not to change this dramatically)

Looking forward to seeing what happens when you regress out cell count

ruifanp commented 3 years ago

image

This is the distribution of normalized cell count for control and deletions. The deletions are shifted a bit to the right, which is a bit surprising since the controls were plated first. It seems that cell count is very weakly correlated at best with the control or deletion. The fact that there is a bit of a bimodal distribution is a bit interesting and I'll dig a bit deeper into this.

Below are the features with the highest correlation to the cell count: image

I regressed out cell count by taking the residual of each data point with respect to the cell count.

This is with a random train test split: image

This is with splitting based on patient number.

image

Note that splitting on patient number is not reliable. There are incredible variations in the score based on the random seed used to do the splitting. For example, the range of scores for the F/T/T instance literally went from 0.25 to 0.95 based on luck. I ran each condition several times and tried to pick a representative score.

Looking at the random split table above we can see that there is not much difference in the scores when using the residuals of the data, suggesting the cell count is not a huge driving factor, when there actually are cells and they're imaged correctly. This fits with the cell count distribution plot above. However, presence of low cell count wells add huge amounts of noise and faulty data, as evidenced by how the score improves when the low cell count (<100) wells are thrown out.

shntnu commented 3 years ago

Thanks for the systematic analysis here!

We are primarily interested in results based on splitting by patient (i.e. donor) id, but it's good to have the random split handy.

Also, given the uncertainty around isogenics, I'll focus on humans only.

And given the issues with wells with low cell count, I'll focus on these case where you exclude low cell count.

So that brings us down to the first row T/T/na, where you get a score of 0.91 / 0.83 (without and with regressing out cell count).

So focusing on that alone,

  1. Do I understand correctly that you are getting a lot of variance in your classification based on the split?
  2. What is the proportion of train and test?
ruifanp commented 3 years ago
  1. That's right.
  2. About 0.7 train and 0.3 test.
shntnu commented 3 years ago

@ruifanp got it

(Recap for me that this is the composition https://github.com/broadinstitute/neuronal-cell-painting#dataset-summary)

So for n = 22 per class, 70:30 is about 15:7 per class (x 8 or so replicates, although not all will have 8 replicates because you throw out some because of cell count).

I wonder how much of the variation in test accuracy is driven by the fact that we drop replicates for some donors. But it can't be a whole lot. Hm.

Can you do the following:

  1. Do train-test split 20 times (and do it the way you have in the first row i.e. T/T/na)
  2. For each run, report the score as you do right now, but also, the number of actual data points (i.e. including replicates) in each class and split. So you will have 4 extra columns i.e. n_test_deletion, n_test_control, n_train_deletion, n_train_control. Please also drop in the CSV file so it's easy to inspect.

If we find that the bouncing around is just because of number of data points, I think we're ok and we needn't have @mtegtmey repeat the experiment. We still need to figure out what's driving the differences, but we will be reasonably satisfied that there's a difference and that any variation we see can be explained.

ruifanp commented 3 years ago

This is the result on 100 runs with different splits.

Stem image

Progenitors image

Splitting by cell line instead of random results in significantly lower accuracy and higher standard deviation. Over 100 runs we only get about 2/3 accuracy on average.

The csv files corresponding to the results: patient split, random split

I wanted to see if there was any way we can plot visually that the controls and deletions have differences. Below are the force directed graphs for the stem and progenitors, with the clusters labelled based on the cell line number.

Stem image

You can see clear separation between human and isogenics and within the human samples, the lower numbers (controls) also somewhat separate from the higher numbers (deletions).

Progenitors image

The numbers on the right show how many wells (out of 8) were removed due to having bad cell count. In this case, there isn't any visible separation between the classes we're looking for.

mtegtmey commented 3 years ago

Would you guys have 15-30 min for a huddle this week? I want to make sure I understand the data so far and make a concrete decision about the next steps (repeat or no).

Sent from my iPhone

On Aug 16, 2021, at 1:38 PM, Ruifan Pei @.***> wrote:

 This is the result on 100 runs with different splits.

Splitting by cell line instead of random results in significantly lower accuracy and higher standard deviation. Over 100 runs we only get about 2/3 accuracy on average.

The csv files corresponding to the results: patient split, random split

I wanted to see if there was any way we can plot visually that the controls and deletions have differences. Below are the force directed graphs for the stem and progenitors, with the clusters labelled based on the cell line number.

Stem

You can see clear separation between human and isogenics and within the human samples, the lower numbers (controls) also somewhat separate from the higher numbers (deletions).

Progenitors

The numbers on the right show how many wells (out of 8) were removed due to having bad cell count. In this case, there isn't any visible separation between the classes we're looking for.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

mtegtmey commented 3 years ago

@shntnu @ruifanp repeat plate for the NPCs will be imaged tomorrow! Any specific place you'd like me to transfer the images once they're finished?

shntnu commented 3 years ago

I am copying it now

aws s3 sync  --dryrun   /imaging/analysis/stanley/nehme_lab/cellpainting/22q11.2_NPC_8.31.21/BR00127194__2021-09-03T17_06_45-Measurement_2   s3://imaging-platform/projects/2019_05_28_Neuronal_Cell_Painting/NCP_PROGENITORS_1/images/BR00127194__2021-09-03T17_06_45-Measurement_2

I had to log in to an interactive node to do this; can't do from login node

https://broadinstitute.slack.com/archives/C3QFX04P7/p1631212076006000?thread_ts=1598987109.024400&cid=C3QFX04P7

shntnu commented 3 years ago

@bethac07 This is all set now

s3://imaging-platform/projects/2019_05_28_Neuronal_Cell_Painting/NCP_PROGENITORS_1/images/BR00127194__2021-09-03T17_06_45-Measurement_2

I'm not sure why it barfed twice, but looks good to go.

I believe Pearl's pipelines are here https://imaging-platform.s3.us-east-1.amazonaws.com/projects/2019_05_28_Neuronal_Cell_Painting/workspace/pipelines/NCP_PROGENITORS_1

We want to run both analysis pipelines

Screen Shot 2021-09-16 at 1 45 49 PM

Please LMK if there's anything else you need.

bethac07 commented 3 years ago

Do you WANT the branch analysis run separately or just folded into the larger analysis?

mtegtmey commented 3 years ago

If possible a separate run would be ideal, so we could peek at that data sooner. But if you feel like it makes more sense to just bring it into the larger analysis by all means go ahead that way!

On Sep 16, 2021, at 2:37 PM, Beth Cimini @.***> wrote:

Do you WANT the branch analysis run separately or just folded into the larger analysis?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/broadinstitute/neuronal-cell-painting/issues/10#issuecomment-921148651, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMSE5EXR35T3ZP5OY4CWZ2LUCI2ODANCNFSM4UMKM2GA.

shntnu commented 3 years ago

@mtegtmey @bethac07 IIUC the hands-on time for folding into the larger analysis will take no longer than branch analysis, so let's go with together

bethac07 commented 3 years ago

So @rsenft1 and I were running this second batch through and one thing we noticed is that the cell boundaries don't follow all the way out to the small dim processes - confirmed that it seems the same was true in the first batch (see screenshot below from NCP_PROGENITORS_1/O-05 site 3). Is this the desired behavior? For this second batch, would you want us to a) make it most close to the results of the last batch or b) make it follow all these processes out? We can design a pipeline either way but wanted to get your guys thoughts on it.

image

raldanehme commented 3 years ago

It would be great if it could follow all the processes out- that is exactly what we want to measure. Thanks for flagging this!

On Fri, Sep 17, 2021 at 11:36 AM Beth Cimini @.***> wrote:

So @rsenft1 https://github.com/rsenft1 and I were running this second batch through and one thing we noticed is that the cell boundaries don't follow all the way out to the small dim processes - confirmed that it seems the same was true in the first batch (see screenshot below from NCP_PROGENITORS_1/O-05 site 3). Is this the desired behavior? For this second batch, would you want us to a) make it most close to the results of the last batch or b) make it follow all these processes out? We can design a pipeline either way but wanted to get your guys thoughts on it.

[image: image] https://user-images.githubusercontent.com/6721515/133814942-10be15ee-e13e-4a2d-b78c-5562998b1ff9.png

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/broadinstitute/neuronal-cell-painting/issues/10#issuecomment-921892948, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJX7ZO3WQTE3F5QSJV5ZPYDUCNOBLANCNFSM4UMKM2GA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

bethac07 commented 3 years ago

Ok, can do.

Do we need to rerun the first batch? Otherwise you may get different results from that batch and this one- sorry, I haven't been in the loop enough to know whether this is intended to supplement or replace the plate from December.

mtegtmey commented 3 years ago

No need to run the first batch, this was a redo!

Sent from my iPhone

On Sep 17, 2021, at 11:40 AM, raldanehme @.***> wrote:

 It would be great if it could follow all the processes out- that is exactly what we want to measure. Thanks for flagging this!

On Fri, Sep 17, 2021 at 11:36 AM Beth Cimini @.***> wrote:

So @rsenft1 https://github.com/rsenft1 and I were running this second batch through and one thing we noticed is that the cell boundaries don't follow all the way out to the small dim processes - confirmed that it seems the same was true in the first batch (see screenshot below from NCP_PROGENITORS_1/O-05 site 3). Is this the desired behavior? For this second batch, would you want us to a) make it most close to the results of the last batch or b) make it follow all these processes out? We can design a pipeline either way but wanted to get your guys thoughts on it.

[image: image] https://user-images.githubusercontent.com/6721515/133814942-10be15ee-e13e-4a2d-b78c-5562998b1ff9.png

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/broadinstitute/neuronal-cell-painting/issues/10#issuecomment-921892948, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJX7ZO3WQTE3F5QSJV5ZPYDUCNOBLANCNFSM4UMKM2GA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

bethac07 commented 3 years ago

With the new settings, this is a more representative field of what we're seeing - green here is actin, magenta is DNA. Does this look like what you were expecting/hoping for for this cell type? If so, I can pull the trigger on analysis today or Monday.

image