Closed shntnu closed 2 years ago
thanks @shntnu.
- When doing rare variant burden test: for a given feature, would you recommend checking correlation among replicates and use it if it has high correlation?
I wouldn't filter based on that but certainly good to report replicate reproducibility of the features at the aggregate level. I have not thought through whether dropping features would bias the analysis in any way, so for now, its best to keep them but definitely report them. I'll post here on how to do that.
- There was another suggestion to select the interpretable feature when selecting one of several correlated features - any suggestion how to do it other than manually? e.g. any special feature class?
This is not straightforward to do in an automated fashion, but once you filter down to a handful of features that you are going to probe, you can follow a procedure like this
This code snippet computes replicate correlations for each feature. The result is attached. A somewhat sparse documentation of the function is here
plates <-
c(
"BR00106708",
"BR00106709",
"BR00107338",
"BR00107339",
"cmqtlpl1.5-31-2019-mt",
"cmqtlpl261-2019-mt"
)
profile_files <-
file.path("1.profile-cell-lines/profiles/",
paste(plates, "augmented.csv", sep = "_"))
profiles <- profile_files %>% map_df(read_csv)
replicate_correlation_values <-
cytominer::replicate_correlation(
profiles,
names(profiles) %>% str_subset("Cells_|Cytoplasm_|Nuclei_"),
strata = "Metadata_line_ID",
replicates = 8,
split_by = "Metadata_Plate",
cores = 8)
# drop Manders, Costes, and features that measure Z axis
replicate_correlation_values %>%
filter(!str_detect(variable, "Costes|Manders|_Z_|_Z")) %>%
write_csv("replicate_correlation_values.csv")
replicate_correlation_values.txt
And here's a quick way to see how it could be useful when inspecting Zernike feature of the cell.
replicate_correlation_values %>%
filter(str_detect(variable, "Cells_AreaShape_Zernike")) %>%
separate("variable", c("x1", "x2", "x3", "n", "m")) %>%
ggplot(aes(n, m, size = median, label = sprintf("%.2f", median))) +
geom_label()
So, very roughly, I'd be worried if you had 8_0
or 8_4
showing up as having a strong genetic basis.
Let's use this thread to discuss any questions from today @sasgari @jatinarora-upmc.