HelenaLC / CATALYST

Cytometry dATa anALYsis Tools
67 stars 30 forks source link

Compensating IMC data #240

Closed algebio closed 2 years ago

algebio commented 2 years ago

Hi Helena & team

I'm trying to compensate IMC data. I have a set of single colour text files and a set of imc tonsil samples that I would like to compensate. I have used readSCEfromTXT() and some code from imcRtools to create a SCE from the single cell colours and prepData to create a SCE from the tonsil samples.

I have plotted a spillover matrix which shows spillover signal from CD206(Gd157Di) to CD79a(Gd158Di). I have compensated it using nnls method and plotted a "before and after" compensation scatter plot of those two channels showing no differences.

I think I'm missing something but I can't find what it is. Maybe you can spot the mistake in no time. Find below a copy of the code.

Thank you for CATALYST and for your help!

get single-stained control samples

sce = readSCEfromTXT("C:/Users/njo47/OneDrive - Newcastle University/Documents/CompensationTextFiles/") Spotted channels: In113, In115, Pr141, Nd142, Nd143, Nd144, Nd145, Nd146, Sm147, Nd148, Sm149, Nd150, Eu151, Sm152, Eu153, Sm154, Gd155, Gd156, Gd157, Gd158, Tb159, Gd160, Dy161, Dy162, Dy163, Dy164, Ho165, Er166, Er167, Er168, Tm169, Er170, Yb171, Yb172, Yb173, Yb174, Lu175, Yb176, Ir193, Pt194, Pt198 Acquired channels: In113, In115, Pr141, Nd142, Nd143, Nd144, Nd145, Nd146, Sm147, Nd148, Sm149, Nd150, Eu151, Sm152, Eu153, Sm154, Gd155, Gd156, Gd157, Gd158, Tb159, Gd160, Dy161, Dy162, Dy163, Dy164, Ho165, Er166, Er167, Er168, Tm169, Er170, Yb171, Yb172, Yb173, Yb174, Lu175, Yb176, Ir193, Pt194, Pt198 Channels spotted but not acquired:
Channels acquired but not spotted:
assay(sce, "exprs") <- asinh(counts(sce)/5)

specify mass channels stained for & debarcode

bc_ms <- c(113,115,141:176,193,194,198) sce <- assignPrelim(sce, bc_ms, verbose = FALSE) sce <- applyCutoffs(estCutoffs(sce)) plotSpotHeatmap(sce)

compute & extract spillover matrix

sce <- computeSpillmat(sce) sm <- metadata(sce)$spillover_matrix

do some sanity checks

chs <- channels(sce) ss_chs <- chs[rowData(sce)$is_bc] all(diag(sm[ss_chs, ss_chs]) == 1) [1] TRUE all(sm >= 0 & sm <= 1) [1] TRUE plotSpillmat(sce) Warning messages: 1: In sprintf("%2.2f", colSums(sm) * 100 - 100, "%") : one argument not used by format '%2.2f' 2: It is deprecated to specify guide = FALSE to remove a guide. Please use guide = "none" instead.

Get Tonsil data to compensate

setwd("~/Covid19IMC/TONSIL/R/run5/Output_CSV-to-FCS_2021-11-03_12-20-34") md <- read_excel("metadata.xlsx")
head(data.frame(md)) file_name sample_id condition patient_id batch 1 T1.fcs BATCH003_TONSIL_START pre TMA Batch 3 2 T2.fcs BATCH003_TONSIL_END post TMA Batch 3 3 T3.fcs BATCH004_TONSIL_START pre TMA Batch 4 4 T4.fcs BATCH004_TONSIL_END post TMA Batch 4 5 T5.fcs BATCH005_TONSIL_START pre TMA Batch 5 6 T6.fcs BATCH005_TONSIL_END post TMA Batch 5 panel <- read_excel("panel2.xlsx")

head(data.frame(panel)) fcs_colname antigen marker_class 1 Sars_CoV2_Spike(In113Di) Sars_CoV2_Spike(In113Di) state 2 CD45RO(In115Di) CD45RO(In115Di) type 3 CD45RA(Pr141Di) CD45RA(Pr141Di) type 4 CD68(Nd142Di) CD68(Nd142Di) type 5 CD8a(Nd143Di) CD8a(Nd143Di) type 6 Ki67(Nd144Di) Ki67(Nd144Di) state fs <- read.flowSet(md$file_name, transformation = FALSE,truncate_max_range = F)

spot check that all panel columns are in the flowSet object

all(panel$fcs_colname %in% colnames(fs)) [1] TRUE

specify levels for conditions & sample IDs to assure desired ordering

md$condition <- factor(md$condition, levels = c("pre", "post")) md$batch <- factor(md$batch, levels = c("Batch 3", "Batch 4","Batch 5", "Batch 6","Batch 7", "Batch 8","Batch 9", "Batch 10","Batch 11", "Batch 12","Batch 13", "Batch 14")) md$sample_id <- factor(md$sample_id, levels = md$sample_id[order(md$condition)])

construct SingleCellExperiment

construct SCE of multiplexed cells

sce <- prepData(fs, panel, md, features = panel$fcs_colname,md_cols = list(file = "file_name", id = "sample_id", factors = c("condition", "patient_id","batch")))

sce <- compCytof(sce, sm, method = "nnls", overwrite = FALSE)

visualize data before & after compensation

chs <- c("CD206(Gd157Di)", "CD79a(Gd158Di)") as <- c("exprs", "compexprs") ps <- lapply(as, function(a) + plotScatter(sce, chs, assay = a)) plot_grid(plotlist = ps, nrow = 1)

head(colData(sce)) DataFrame with 6 rows and 4 columns sample_id condition patient_id batch

1 BATCH003_TONSIL_START pre TMA Batch 3 2 BATCH003_TONSIL_START pre TMA Batch 3 3 BATCH003_TONSIL_START pre TMA Batch 3 4 BATCH003_TONSIL_START pre TMA Batch 3 5 BATCH003_TONSIL_START pre TMA Batch 3 6 BATCH003_TONSIL_START pre TMA Batch 3 head(reducedDim(sce)) Error in h(simpleError(msg, call)) : error in evaluating the argument 'x' in selecting a method for function 'head': no available entries for 'reducedDim(, ...)' sessionInfo() R version 4.1.1 (2021-08-10) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale: [1] LC_COLLATE=English_United Kingdom.1252 [2] LC_CTYPE=English_United Kingdom.1252
[3] LC_MONETARY=English_United Kingdom.1252 [4] LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.1252

attached base packages: [1] stats4 stats graphics grDevices utils datasets methods
[8] base

other attached packages: [1] ggplot2_3.3.5 cowplot_1.1.1
[3] pheatmap_1.0.12 CATALYST_1.17.3
[5] flowCore_2.5.0 readxl_1.3.1
[7] imcRtools_0.99.9 SpatialExperiment_1.3.4
[9] SingleCellExperiment_1.15.2 SummarizedExperiment_1.23.5 [11] Biobase_2.53.0 GenomicRanges_1.45.0
[13] GenomeInfoDb_1.29.8 IRanges_2.27.2
[15] S4Vectors_0.31.5 BiocGenerics_0.39.2
[17] MatrixGenerics_1.5.4 matrixStats_0.61.0

loaded via a namespace (and not attached): [1] scattermore_0.7 flowWorkspace_4.5.3
[3] R.methodsS3_1.8.1 tidyr_1.1.4
[5] bit64_4.0.5 irlba_2.3.3
[7] multcomp_1.4-17 DelayedArray_0.19.4
[9] R.utils_2.11.0 data.table_1.14.2
[11] RCurl_1.98-1.5 doParallel_1.0.16
[13] generics_0.1.1 ScaledMatrix_1.1.0
[15] terra_1.4-11 TH.data_1.1-0
[17] proxy_0.4-26 ggpointdensity_0.1.0
[19] bit_4.0.4 tzdb_0.1.2
[21] xml2_1.3.2 httpuv_1.6.3
[23] assertthat_0.2.1 viridis_0.6.2
[25] hms_1.1.1 promises_1.2.0.1
[27] fansi_0.5.0 Rgraphviz_2.37.2
[29] igraph_1.2.7 DBI_1.1.1
[31] htmlwidgets_1.5.4 purrr_0.3.4
[33] ellipsis_0.3.2 dplyr_1.0.7
[35] ggcyto_1.21.0 ggnewscale_0.4.5
[37] ggpubr_0.4.0 backports_1.2.1
[39] cytolib_2.5.3 svgPanZoom_0.3.4
[41] RcppParallel_5.1.4 sparseMatrixStats_1.5.3
[43] vctrs_0.3.8 abind_1.4-5
[45] withr_2.4.2 ggforce_0.3.3
[47] aws.signature_0.6.0 cytomapper_1.5.5
[49] vroom_1.5.5 svglite_2.0.0
[51] cluster_2.1.2 crayon_1.4.1
[53] drc_3.0-1 labeling_0.4.2
[55] edgeR_3.35.3 pkgconfig_2.0.3
[57] units_0.7-2 tweenr_1.0.2
[59] vipor_0.4.5 rlang_0.4.11
[61] lifecycle_1.0.1 sandwich_3.0-1
[63] rsvd_1.0.5 cellranger_1.1.0
[65] polyclip_1.10-0 graph_1.71.2
[67] tiff_0.1-8 Matrix_1.3-4
[69] raster_3.5-2 carData_3.0-4
[71] Rhdf5lib_1.15.2 zoo_1.8-9
[73] base64enc_0.1-3 beeswarm_0.4.0
[75] RTriangle_1.6-0.10 ggridges_0.5.3
[77] GlobalOptions_0.1.2 png_0.1-7
[79] viridisLite_0.4.0 rjson_0.2.20
[81] bitops_1.0-7 shinydashboard_0.7.2
[83] R.oo_1.24.0 ConsensusClusterPlus_1.57.0 [85] KernSmooth_2.23-20 rhdf5filters_1.5.0
[87] DelayedMatrixStats_1.15.4 shape_1.4.6
[89] classInt_0.4-3 stringr_1.4.0
[91] readr_2.0.2 jpeg_0.1-9
[93] rstatix_0.7.0 ggsignif_0.6.3
[95] aws.s3_0.3.21 beachmat_2.9.1
[97] scales_1.1.1 magrittr_2.0.1
[99] plyr_1.8.6 hexbin_1.28.2
[101] zlibbioc_1.39.0 compiler_4.1.1
[103] dqrng_0.3.0 concaveman_1.1.0
[105] RColorBrewer_1.1-2 plotrix_3.8-2
[107] clue_0.3-60 cli_3.0.1
[109] XVector_0.33.0 ncdfFlow_2.39.1
[111] FlowSOM_2.1.24 MASS_7.3-54
[113] tidyselect_1.1.1 stringi_1.7.5
[115] forcats_0.5.1 RProtoBufLib_2.5.1
[117] yaml_2.2.1 BiocSingular_1.9.1
[119] locfit_1.5-9.4 latticeExtra_0.6-29
[121] ggrepel_0.9.1 grid_4.1.1
[123] EBImage_4.35.2 tools_4.1.1
[125] parallel_4.1.1 rio_0.5.27
[127] CytoML_2.5.4 circlize_0.4.13
[129] rstudioapi_0.13 foreach_1.5.1
[131] foreign_0.8-81 gridExtra_2.3
[133] farver_2.1.0 Rtsne_0.15
[135] ggraph_2.0.5 DropletUtils_1.13.4
[137] digest_0.6.28 shiny_1.7.1
[139] Rcpp_1.0.7 car_3.0-11
[141] broom_0.7.9 scuttle_1.3.1
[143] later_1.3.0 httr_1.4.2
[145] sf_1.0-3 ComplexHeatmap_2.9.4
[147] colorspace_2.0-2 XML_3.99-0.8
[149] splines_4.1.1 RBGL_1.69.0
[151] scater_1.21.8 graphlayouts_0.7.1
[153] sp_1.4-5 systemfonts_1.0.3
[155] xtable_1.8-4 jsonlite_1.7.2
[157] tidygraph_1.2.0 R6_2.5.1
[159] pillar_1.6.4 htmltools_0.5.2
[161] mime_0.12 nnls_1.4
[163] glue_1.4.2 fastmap_1.1.0
[165] DT_0.19 BiocParallel_1.27.17
[167] BiocNeighbors_1.11.0 fftwtools_0.9-11
[169] class_7.3-19 codetools_0.2-18
[171] mvtnorm_1.1-3 utf8_1.2.2
[173] lattice_0.20-45 tibble_3.1.5
[175] curl_4.3.2 ggbeeswarm_0.6.0
[177] colorRamps_2.3 gtools_3.9.2
[179] magick_2.7.3 zip_2.2.0
[181] openxlsx_4.2.4 survival_3.2-13
[183] limma_3.49.5 munsell_0.5.0
[185] e1071_1.7-9 GetoptLong_1.0.5
[187] rhdf5_2.37.4 GenomeInfoDbData_1.2.7
[189] iterators_1.0.13 HDF5Array_1.21.0
[191] reshape2_1.4.4 haven_2.4.3
[193] gtable_0.3.0

Rplot3.pdf comp CD79a CD206.pdf

nilseling commented 2 years ago

Hi @algebio,

have you tried plotting the scatter plots individually instead of using lapply?

Could you try this:

plotScatter(sce, chs, assay = "exprs")
plotScatter(sce, chs, assay = "compexprs")

and see if they differ?

Cheers,

Nils

algebio commented 2 years ago

Hi Nils

Thank you for your help. Unfortunately, the results of both plotScatter are exactly the same. I'm doing something wrong but I can't see where.

Regards Juan

nilseling commented 2 years ago

The spillover matrix looks fine to me. Have you also tried to generate the scatter plot without using plotScatter? Just to make sure that it's not a visualisation issue.

algebio commented 2 years ago

Same.

I also checked the values of sce@assays@data@listData[["exprs"]] and sce@assays@data@listData[["compexprs"]]. They seem exactly the same values.

nilseling commented 2 years ago

Hmm, then I don't know where the issue is. Maybe @HelenaLC needs to have a look.

algebio commented 2 years ago

Hi Nils, firstly, thank you for trying to help me. I really appreciate it.

Second. After a long time banging my head against my desk, I found a mismatch in the number of channels because Ir193 is in the single metal spillover matrix but not in the tonsil csv files. I thought that was the problem but then I re-read CATALYST vignette and saw that "columns present in the SM but not in the input data will be removed from it", so that shouldn't be the problem.

Then I saw that the function assignPrelim calls another function .get_ms_from_chs(chs) which is a helper function to get mass and metal from a channel name. This is OK when a channel name is something like Ir193 but not when the names in the channels are of the sort "CD45RA(Pr141Di)" , "CD68(Nd142Di)" (like the ones I'm using) which gave me masses of 45141, 68142, etc.

After changing the names in sce@rowRanges@elementMetadata@listData$channel_name, I managed to compensate my sce object from my tonsil dataset. The problem now is that I am insecure about how reliable my results are because I got my first results (which were not compensated) without any error message. How can I be sure that this second one is correct? Is there any sanity check to be 100% sure that my compensated matrix is correct and I can trust this compensated sce?

Thanks again.

nilseling commented 2 years ago

Hi @algebio

I'm glad you figured this out! Getting masses like 45141 looks like a bug to me and should be addressed by the maintainers of CATALYST. Also just as a side note, you should call the channel_name via rowData(sce)$channel_name. The sanity check here would be to generate the scatter plots of markers that are affected by spillover (e.g. In113 and In115) and also check that other markers are not being massively corrected (e.g. Eu153 and Sm154). In general, with correct panel design, you will not notice a huge effect of the spillover correction. So just watch out for any obvious issues in the downstream analysis.

algebio commented 2 years ago

"...you will not notice a huge effect of the spillover correction." Well, after checking my spillover matrix after compensation, all red flags are up. I got values as high as 80! Back to square one.

beforeCompensation.pdf afterCompensation.pdf

nilseling commented 2 years ago

I'm not exactly sure what you mean with "spillover matrix after compensation". The spillover matrix that you generated (beforeCompensation) looks quite similar to the one of the original publication (displayed here). You then use the generated spillover matrix to compensate your actual dataset. The code that you posted originally looks alright to me. So you will need to compare the actual counts of your dataset (not the spillover matrix) before and after compensation.

algebio commented 2 years ago

Hi Nils. Thank you for your latest message, your help has been fundamental for me to finally get this done. I will acknowledge you if we get any publication using compensated imc data.

My last question is just a sanity check to confirm that I did the right thing to compare counts (maybe there is a better way as it happened with channel_name). I used write.csv(assay(sce, "exprs"), "countsBefCompensation.csv") and write.csv(assay(sce, "compexprs"), "countsAfterCompensation.csv"), I got the attached results. If you confirm that everything is OK, I will close this question.

Big Thanks Juan.

Screenshot 2021-11-15 124347
nilseling commented 2 years ago

Hi Juan, no need to acknowledge me - just wanting to make sure that everything works :) I'd recommend plotting the expression values next to each other as you have done in the original post. To me the counts before compensation don't seem to be asinh transformed. Values like 27 for CD45RA would be far too high for transformed counts.

algebio commented 2 years ago

Spot on, Nils. I had not transformed the data. This is what I get now. Do these plots seem legit to you? They seem too "messy" to me. image image

Regards Juan

nilseling commented 2 years ago

To me they look alright. The Yb173 --> Yb174 spillover is nicely corrected. Also check out Gd157 --> Gd158 and the Indiums.

algebio commented 2 years ago

Glad to hear. I don't have previous experience, so I was expecting something similar to the CD14 / CD16 plot from the CATALYST vignette.

These are the other plots that you mentioned

image image

nilseling commented 2 years ago

That looks alright. Maybe still check that there are numerical differences in In113 and In115 before and after compensation. But otherwise I guess this issue is now resolved, right?

algebio commented 2 years ago

It is indeed. Thank you so much for all the help and advice, Nils!