Closed shntnu closed 4 years ago
All done!
Images are currently being copied to S3
hi @shntnu , is the data for this 7th plate ready and accessible?
hi @shntnu , is the data for this 7th plate ready and accessible?
Not yet - I'll look up the status and loop back
@jatinarora-upmc This should be ready by Mar 25
@shntnu alright. thanks
@jatinarora-upmc Nearly all files for the new batch are available in #31.
The colony
and isolated
versions are pending. Perhaps you could start inspecting this data to get started? I noticed that a lot of wells have very few cells, but not inspected further.
The
colony
andisolated
versions are pending.
I have skipped creating these files because you are currently not using them.
(we need to fix #16 before we can reliably created those files)
hi @shntnu , i am going through the this plate 7. I noticed this plate has ~23% of cells with >5% missing features, while this rate was 3-7% on other plates. Any idea what could be causing this?
The cell counts are definitely very low for that plate (cmqtlpl1.5-31-2019-mt
)
plates <- c("cmqtlpl1.5-31-2019-mt",
"cmqtlpl261-2019-mt",
"BR00106708",
"BR00106709",
"BR00107338",
"BR00107339",
"cmQTLplate7-2-27-20")
counts <-
map_df(
plates,
function(plate) {
read_csv(
file.path("profiles", glue("{plate}_count.csv"))
) %>%
distinct()
}
)
metadata <-
map_df(
plates,
function(plate) {
read_csv(
file.path("profiles", glue("{plate}_augmented.csv")),
col_types = cols_only(
Metadata_Plate = "c",
Metadata_Well = "c",
Metadata_Assay_Plate_Barcode = "c",
Metadata_Plate_Map_Name = "c",
Metadata_well_position = "c",
Metadata_plating_density = "c",
Metadata_line_ID = "c"
)
) %>%
distinct()
}
)
counts %<>% inner_join(metadata)
counts %>%
ggplot(aes(Metadata_Plate, Count_Cells)) + geom_boxplot() + coord_flip()
Still diagnosing…
And looking at the plate alone, definitely something amiss
@shntnu any idea if the root of this problem lie somewhere in processing images (e.g. segmentation) or wet lab part on the plate?
To me, this would likely be an issue on the wetlab side of things.
If you feel like this data is unusable through the QC steps let me know and I can coordinate with Emily to see if she had any notes about mishaps with this plate.
On Apr 10, 2020, at 5:13 PM, Jatin Arora notifications@github.com wrote:
@shntnu any idea if the root of this problem lie somewhere in processing images (e.g. segmentation) or wet lab part on the plate?
— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub, or unsubscribe.
More probing
I remove all annotations to reduce clutter
The 3 lines are 25th, 50th, 75th percentile of cell counts across all wells, and their values are 774, 1629, 2752 respectively
(q25 <- quantile(counts$Count_Cells, .25, names = FALSE))
(q50 <- quantile(counts$Count_Cells, .50, names = FALSE))
(q75 <- quantile(counts$Count_Cells, .75, names = FALSE))
counts %>%
ggplot(aes(fct_reorder(Metadata_line_ID, Count_Cells), Count_Cells)) +
geom_boxplot() +
geom_hline(yintercept = q25, color = "gray") +
geom_hline(yintercept = q50, color = "gray") +
geom_hline(yintercept = q75, color = "gray") +
facet_wrap(~Metadata_Plate, scales = "free_x") +
theme_void()
To me, this would likely be an issue on the wetlab side of things. If you feel like this data is unusable through the QC steps let me know and I can coordinate with Emily to see if she had any notes about mishaps with this plate.
That would be great, @mtegtmey. I haven't looked into the images but would be good know if Emily has some notes.
After talking with Emily, she mentioned something about a change in pressure on the liquid handler when adding PFA to the samples. However she said she stopped it about halfway through, which would account for the ubiquitous drop in cell counts (only half the plate would be low, hypothetically). The most likely issue is a mis-calculation of the cell counts or the time which elapsed during the upstream cell culture work which caused more cells to sink to the bottom of the plate. Even when re-suspending them prior to plating they may not have all be mixed well.
Thanks @mtegtmey - glad to know there's some explanation for this. Meanwhile, I'm making some notes below in case someone from our end can dig into the images
Goal: To figure out whether there is anything amiss in the images (other than low cell count) that may have led to the issue that @jatinarora-upmc described https://github.com/broadinstitute/cmQTL/issues/30#issuecomment-611036214
parallel aws s3 cp s3://imaging-platform/projects/2018_06_05_cmQTL/2020_03_05_Batch6/images/cmQTLplate7-2-27-20__2020-03-04T16_40_12-Measurement1/Images/r01c01f01p01-ch{1}sk1fk1fl1.tiff . ::: 1 2 3 4 5 6
name | value |
---|---|
FileName_OrigRNA | r01c01f01p01-ch3sk1fk1fl1.tiff |
FileName_OrigER | r01c01f01p01-ch4sk1fk1fl1.tiff |
FileName_OrigAGP | r01c01f01p01-ch2sk1fk1fl1.tiff |
FileName_OrigMito | r01c01f01p01-ch1sk1fk1fl1.tiff |
FileName_OrigBrightfield | r01c01f01p01-ch6sk1fk1fl1.tiff |
FileName_OrigDNA | r01c01f01p01-ch5sk1fk1fl1.tiff |
<Map>
<Entry ChannelID="1">
<FlatfieldProfile>{Background: {Character: NonFlat, Mean: 289.00625, NoiseConst: 5.897988, NonFlatness: {Corrected: 0.090252958, Original: 0.45296502, Random: 0.029437569}, Profile: {Coefficients: [[1.1471], [-0.0114, -0.0195], [-0.8175, 0.248, -0.8125], [0.0229, 0.4083, -0.1066, -0.0519], [-0.8331, -0.489, 0.6704, -0.2165, -0.4433]], Dims: [2160, 2160], Origin: [1079.5, 1079.5], Scale: [0.00046296296, 0.00046296296], Type: Polynomial}, Quality: 1.0}, Channel: 1, ChannelName: Alexa 647, Foreground: {Character: NonFlat, NonFlatness: {Original: 0.71276712, Random: 0.062127856}, Profile: {Coefficients: [[1.2259], [0.0932, -0.3422], [-1.1153, 0.4186, -2.0113], [0.4411, 1.4414, -0.7732, 0.82], [-0.2503, -0.342, 0.0903, -0.0531, 2.9682]], Dims: [2160, 2160], Origin: [1079.5, 1079.5], Scale: [0.00046296296, 0.00046296296], Type: Polynomial}, Quality: 1.0}, Version: Acapella:2013}</FlatfieldProfile>
</Entry>
<Entry ChannelID="2">
<FlatfieldProfile>{Background: {Character: NonFlat, Mean: 326.66138, NoiseConst: 7.6508897, NonFlatness: {Corrected: 0.19489469, Original: 0.63753444, Random: 0.022788157}, Profile: {Coefficients: [[1.1815], [-0.2002, -0.0244], [-1.1124, 0.3297, -1.2384], [0.4434, 0.7452, -0.2116, -0.0575], [0.2517, -0.3976, 0.6822, -0.0854, 0.5229]], Dims: [2160, 2160], Origin: [1079.5, 1079.5], Scale: [0.00046296296, 0.00046296296], Type: Polynomial}, Quality: 1.0}, Channel: 2, ChannelName: Alexa 568, Foreground: {Character: NonFlat, NonFlatness: {Original: 0.75699329, Random: 0.045633834}, Profile: {Coefficients: [[1.2525], [-0.0642, -0.2533], [-1.2709, 0.2196, -2.2142], [0.5299, 1.3018, -0.5135, 0.3411], [-0.6716, 0.497, 1.6567, 0.1297, 2.7825]], Dims: [2160, 2160], Origin: [1079.5, 1079.5], Scale: [0.00046296296, 0.00046296296], Type: Polynomial}, Quality: 1.0}, Version: Acapella:2013}</FlatfieldProfile>
</Entry>
<Entry ChannelID="3">
<FlatfieldProfile>{Background: {Character: Null, Mean: NaN, Profile: {Type: Identity}, Quality: 0.25}, Channel: 3, ChannelName: 488 long, Foreground: {Character: NonFlat, NonFlatness: {Original: 0.76508814, Random: 0.038975296}, Profile: {Coefficients: [[1.2777], [0.0452, -0.2261], [-1.542, 0.3696, -2.0432], [0.5051, 0.819, -0.6026, 0.3894], [-0.285, -0.5768, 0.7885, 0.0575, 1.5303]], Dims: [2160, 2160], Origin: [1079.5, 1079.5], Scale: [0.00046296296, 0.00046296296], Type: Polynomial}, Quality: 1.0}, Version: Acapella:2013}</FlatfieldProfile>
</Entry>
<Entry ChannelID="4">
<FlatfieldProfile>{Background: {Character: Null, Mean: NaN, Profile: {Type: Identity}, Quality: 0.25}, Channel: 4, ChannelName: Alexa 488, Foreground: {Character: NonFlat, NonFlatness: {Original: 0.79320383, Random: 0.038657013}, Profile: {Coefficients: [[1.279], [0.0522, -0.1535], [-1.7208, 0.2748, -1.8084], [0.2423, 0.6396, -0.5424, 0.3576], [0.4353, -0.567, 1.0237, 0.3775, 0.2043]], Dims: [2160, 2160], Origin: [1079.5, 1079.5], Scale: [0.00046296296, 0.00046296296], Type: Polynomial}, Quality: 1.0}, Version: Acapella:2013}</FlatfieldProfile>
</Entry>
<Entry ChannelID="5">
<FlatfieldProfile>{Background: {Character: NonFlat, Mean: 432.49347, NoiseConst: 1.3, NonFlatness: {Corrected: 0.13452815, Original: 0.71967578, Random: 0.018331587}, Profile: {Coefficients: [[1.2248], [-0.2387, 0.041], [-1.0813, 0.2014, -1.2981], [0.5677, 0.2221, 0.0283, -0.1704], [-1.7485, -0.5801, 0.651, 0.0083, -0.7362]], Dims: [2160, 2160], Origin: [1079.5, 1079.5], Scale: [0.00046296296, 0.00046296296], Type: Polynomial}, Quality: 1.0}, Channel: 5, ChannelName: HOECHST 33342, Foreground: {Character: NonFlat, NonFlatness: {Original: 1.0146352, Random: 0.059520878}, Profile: {Coefficients: [[1.3021], [-0.1311, -0.0537], [-0.7414, 1.1962, -1.3613], [1.2206, 0.9062, -0.9075, -0.4483], [-6.2633, -3.6864, 0.8118, -1.5641, -4.3353]], Dims: [2160, 2160], Origin: [1079.5, 1079.5], Scale: [0.00046296296, 0.00046296296], Type: Polynomial}, Quality: 1.0}, Version: Acapella:2013}</FlatfieldProfile>
</Entry>
<Entry ChannelID="6">
<FlatfieldProfile>{Background: {Character: Null, Profile: {Type: Identity}, Quality: 1}, Channel: 6, ChannelName: Brightfield CP, Foreground: {Character: Flat, Profile: {Type: Identity}, Quality: 1}, Version: Acapella:2013}</FlatfieldProfile>
</Entry>
</Map>
<Map>
Beth can continue to add notes in this issue if she is able to inspect this data. But otherwise, nothing more to do here.
The cell counts for this plate are indeed very low by eye, just looking at the plate.
Screenshot below is one randomly selected field (3) from each well, then all the wells laid out as they would be on the plate- black is background, cells are a mix of red, green, and cyan.
You'll see that by eye >1/2 the wells are almost or completely black.
just looking at the plate.
(for our notes, Beth used the workflow described here)
In https://github.com/broadinstitute/cmQTL/pull/40 (this notebook), I randomly sampled 5000 cells from this plate; these are the number of NA
cells per feature, for the top few features
name | number_of_na |
---|---|
Nuclei_Correlation_Costes_AGP_Mito | 340 |
Cells_Correlation_Costes_ER_Mito | 328 |
Cytoplasm_Correlation_Costes_ER_Mito | 328 |
Cytoplasm_Correlation_Costes_AGP_Mito | 319 |
Nuclei_Correlation_Costes_Mito_AGP | 317 |
Nuclei_Correlation_Costes_RNA_Mito | 314 |
Nuclei_Correlation_Costes_ER_Mito | 313 |
Cells_Correlation_Costes_RNA_Mito | 303 |
Cells_Correlation_Costes_AGP_Mito | 302 |
All features with number_of_na
> 10 were Correlation
features.
These features had apparently nothing to do with cell size, so there's something else going on.
##
## Call:
## lm(formula = Nuclei_Correlation_Costes_AGP_Mito ~ ., data = data_matrix)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.4335 -0.2968 -0.2689 -0.2354 3.9613
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.0190576 0.0487076 0.391 0.6956
## Cells_AreaShape_Area -0.0009254 0.0003775 -2.452 0.0143 *
## Cytoplasm_AreaShape_Area 0.0009250 0.0003775 2.450 0.0143 *
## Nuclei_AreaShape_Area 0.0008028 0.0003518 2.282 0.0226 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.999 on 4990 degrees of freedom
## Multiple R-squared: 0.002636, Adjusted R-squared: 0.002036
## F-statistic: 4.395 on 3 and 4990 DF, p-value: 0.004283
Are they all Costes features, specifically? Because we typically throw those out downstream anyway.
Are they all Costes features, specifically? Because we typically throw those out downstream anyway.
Yes, all the top most frequent ones are are Costes.
I didn't know we throw them out; I thought it was only Manders and RWC as documented here
We've had multiple Slack discussions about throwing them out, but were waiting for a final decision from the profilers - but we-the-assay-devs have seen them be a problem repeatedly in other sets.
Tagged you there to remind you of context.
It also looks like from a search of my email that Greg typically now removes them in pycytominer- see excerpt below from the resistance mechanisms GH issue 40
I also removed
costes
(and other extreme outlier) features from all profiles. This made the profiles look much cleaner 🎉 We will continue dropping these types of features in future projects.
I inspected the features in this notebook and found that all the top most frequent NA-valued features were Costes features.
This features is poorly behaved https://github.com/cytomining/profiling-handbook/pull/52 and we have decided to drop them going forward.
Its not clear why this plate had so many more features with NA values but its possible that for whatever reason this one just ended up with a long tail of NA features (only a few cells are NA, but for many features)
Thanks again @bethac07 for digging into this! We are all set here.
Oops – not quite done yet :) @jatinarora-upmc have a look at this notebook and LMK if it makes sense (sorry, ran out of time to annotate it).
I did the qc (the same as for other 6 plates) on this plate 7. In attached screenshot, it shows that I start with 303612 cells and 4296 features. The features decreased to 3578 post qc. This decrease includes the removal of blacklisted, costes/correlation features etc. These numbers for features are fine, and looks like for other plates. The point to note is that there are 1443 cells (303612-302169) which had missing measurement (NA) for one or more features - which is not the case on other plates. Overall, as you all mentioned many times, although the number of total cells 302,169 (post-qc) is low compared to other plates, but it seems we have good number of features (3578) measured for them.
We are all set here, I think :)
Conclusion: We decided to repeat this plate. This comment https://github.com/broadinstitute/cmQTL/issues/30#issuecomment-618480494 made a strong argument for doing so.
Images should be copied to /imaging/analysis/2018_06_05_cmQTL/2020_03_05_Batch6 See https://broadinstitute.atlassian.net/wiki/spaces/IP/pages/800424256/Process+for+exporting+images+from+the+CDOT+microscopes+using+Harmony for instructions