Hi Brendan,

I hope this inquiry finds you well, and thank you very much for kindly answering my questions. :)

I'm also trying to validate my cell type deconvolution results by integrating 3 Visium data before running the deconvolution function. However, when I followed the dataset integration instructions you've suggested (https://github.com/JEFworks-Lab/STdeconvolve/blob/devel/docs/process_bcl_data.md), ran the deconvolution functions (fitLDA, optimalModel, and getBetaTheta), and tried to visualize the deconvolution scene for each sample, I could not recall any spots. Here's the error message I've got: "Plotting scatterpies for 0 pixels with 20 cell-types...this could take a while if the dataset is large". I also looked into your dialogue with AteeqMKhaliq about this (https://github.com/JEFworks-Lab/STdeconvolve/issues/21) but I still couldn't make anything other than an empty plot. Could you give any pointers on how I may be able to fix this?

Here's the entire code I ran: ###########################################

1 Data integration steps

Read the data

se1 <- SpatialExperiment::read10xVisium("/Users/.../Matrix/PIC83", "PIC83", type = "sparse", data = "filtered") se2 <- SpatialExperiment::read10xVisium("/Users/.../Matrix/PIC84", "PIC84", type = "sparse", data = "filtered") se3 <- SpatialExperiment::read10xVisium("/Users/.../Matrix/PIC85-1", "PIC85-1", type = "sparse", data = "filtered")

Combine counts and positions for all samples

all_counts <- cbind(as.matrix(counts(se1)), as.matrix(counts(se2)), as.matrix(counts(se3))) all_pos <- rbind(spatialCoords(se1), spatialCoords(se2), spatialCoords(se3))

Assign slice identifiers

se1_slice <- rep(1, nrow(spatialCoords(se1))) se2_slice <- rep(2, nrow(spatialCoords(se2))) se3_slice <- rep(3, nrow(spatialCoords(se3))) all_slice <- c(se1_slice, se2_slice, se3_slice) names(all_slice) <- rownames(all_pos)

clean up poor spots and genes

allClean <- cleanCounts(counts = all_counts, min.reads = 10, min.lib.size = 10, verbose = TRUE)

merge the 3 sections together

all_paths <- list() all_paths[[1]] <- as.matrix(t(all_counts[, names(all_slice[all_slice == 1])])) all_paths[[2]] <- as.matrix(t(all_counts[, names(all_slice[all_slice == 2])])) all_paths[[3]] <- as.matrix(t(all_counts[, names(all_slice[all_slice == 3])]))

Selecting ODGenes across 3 samples

all_genes <- lapply(all_paths, function(p){ print(dim(p)) genes <- colnames(p) print(length(genes)) genes }) all_genes <- Reduce(intersect, all_genes) length(all_genes)

all_ODgenes <- lapply(all_paths, function(p) { dat <- preprocess(p, extractPos = FALSE, selected.genes = all_genes, nTopGenes = 5, genes.to.remove = NA, removeAbove = 0.95, removeBelow = NA, min.reads = 10, min.lib.size = 10, min.detected = 1, ODgenes = TRUE, od.genes.alpha = 0.05, gam.k = 5, nTopOD = 500) colnames(dat$corpus) }) unionAllGenes <- Reduce(union, all_ODgenes) length(unionAllGenes)

allCorpus <- preprocess(t(as.matrix(all_counts)), extractPos = FALSE, selected.genes = unionAllGenes, min.reads = 0, min.lib.size = 0) allCorpus$pos <- all_pos[rownames(allCorpus$corpus), ]

filter spots in all_slice based on those kept in the corpus

all_slice_filt <- all_slice[names(all_slice) %in% rownames(allCorpus$pos)] length(all_slice_filt) allCorpus$slice <- all_slice_filt dim(allCorpus$corpus) allCorpus$slm dim(allCorpus$pos) length(allCorpus$slice)

###########################################

2 Running the LDA model

fit LDA models to the data

ldas <- fitLDA(allCorpus$corpus, Ks = seq(20, 20, by = 1), perc.rare.thresh = 0.05, plot=TRUE, verbose=TRUE)

optLDAS <- optimalModel(models = ldas, opt = 20) results <- getBetaTheta(optLDAS, perc.filt = 0.05, betaScale = 1000)

Create a mapping between Ensembl IDs and gene symbols using the biomaRt package

ensembl <- useMart("ensembl", dataset = "hsapiens_gene_ensembl") ensembl_gene_mapping <- getBM(attributes = c("ensembl_gene_id", "external_gene_name"), mart = ensembl)

Write a function to convert Ensembl IDs to gene symbols using the mapping

convert_ensembl_to_symbol <- function(ensembl_id, mapping) { gene_symbol <- mapping$external_gene_name[match(ensembl_id, mapping$ensembl_gene_id)] return(gene_symbol) }

Convert the Ensembl IDs in the colnames(results$beta) into gene symbols

colnames(results$beta) <- sapply(colnames(results$beta), convert_ensembl_to_symbol, mapping = ensembl_gene_mapping)

Check for any NA

any_na <- any(is.na(colnames(results$beta)))

no NA value

Top 10 marker genes for each cluster

topGenes(results$beta, n=10) ###########################################

3 Deconvolution Visualization

Extract the theta values and position information for the first sample

theta1 <- results$theta[rownames(allCorpus$pos)[allCorpus$slice == 1],] pos_int1 <- allCorpus$pos[rownames(allCorpus$pos)[allCorpus$slice == 1],] colnames(pos_int1) <- c("y", "x")

vizAllTopics(theta = theta1, pos = pos_int1, r = 5, lwd = 0, showLegend = TRUE, plotTitle = NA) + ggplot2::guides(fill = ggplot2::guide_legend(ncol = 1)) + ggplot2::geom_rect(data = data.frame(pos_int1), ggplot2::aes(xmin = min(x) - 90, xmax = max(x) + 90, ymin = min(y) - 90, ymax = max(y) + 90), fill = NA, color = "black", linetype = "solid", size = 0.5) + ggplot2::theme( plot.background = ggplot2::element_blank() ) + ggplot2::guides(colour = "none")

########################################### Here are the outputs I get from running "dim(theta1) and dim(pos_int1)": [1] 1644 20 [1] 1644 2 Also, here are few first-line overview of theta1 and pos_int1 when I run head functions: head(theta1): 1 2 3 4 AAACAAGTATCTCCCA-1 0.09521523 0.06524648 0.000000 0.0000000 AAACAGAGCGACTCCT-1 0.00000000 0.00000000 0.000000 0.0000000 AAACATTTCCCGGATT-1 0.00000000 0.00000000 0.000000 0.3612481 AAACCCGAACGAAATC-1 0.05972998 0.00000000 0.000000 0.0000000 AAACCTAAGCAGCCGG-1 0.00000000 0.00000000 0.000000 0.0000000 AAACCTCATGAAGTTG-1 0.00000000 0.00000000 0.254858 0.0000000 5 6 7 8 9 10 AAACAAGTATCTCCCA-1 0.0000000 0.00000000 0 0 0.0000000 0.2516840 AAACAGAGCGACTCCT-1 0.0000000 0.00000000 0 0 0.1006190 0.7903896 AAACATTTCCCGGATT-1 0.0000000 0.00000000 0 0 0.0000000 0.4040613 AAACCCGAACGAAATC-1 0.0000000 0.06014447 0 0 0.4313043 0.0000000 AAACCTAAGCAGCCGG-1 0.3908903 0.00000000 0 0 0.0000000 0.0000000 AAACCTCATGAAGTTG-1 0.0000000 0.00000000 0 0 0.0000000 0.0000000 11 12 13 14 15 16 AAACAAGTATCTCCCA-1 0.4222455 0 0 0.1656088 0.0000000 0.0000000 AAACAGAGCGACTCCT-1 0.0000000 0 0 0.0000000 0.0000000 0.0000000 AAACATTTCCCGGATT-1 0.0000000 0 0 0.0000000 0.2346906 0.0000000 AAACCCGAACGAAATC-1 0.2131955 0 0 0.0000000 0.0000000 0.0858006 AAACCTAAGCAGCCGG-1 0.1892510 0 0 0.0000000 0.0000000 0.2062755 AAACCTCATGAAGTTG-1 0.4748901 0 0 0.0000000 0.0000000 0.0000000 17 18 19 20 AAACAAGTATCTCCCA-1 0.0000000 0 0.0000000 0 AAACAGAGCGACTCCT-1 0.1089915 0 0.0000000 0 AAACATTTCCCGGATT-1 0.0000000 0 0.0000000 0 AAACCCGAACGAAATC-1 0.0000000 0 0.1498251 0 AAACCTAAGCAGCCGG-1 0.0000000 0 0.2135833 0 AAACCTCATGAAGTTG-1 0.0000000 0 0.2702520 0

head(pos_int1): y x AAACAAGTATCTCCCA-1 9041 14153 AAACAGAGCGACTCCT-1 16203 13154 AAACATTTCCCGGATT-1 6843 13606 AAACCCGAACGAAATC-1 10054 15630 AAACCTAAGCAGCCGG-1 6027 12012 AAACCTCATGAAGTTG-1 11521 4619

I know this is a wildly lengthy post, but I couldn't figure out what the issue is after spending almost the entire day debugging, so I would really appreciate your help!!

Thank you very much, Daniel

JEFworks-Lab / STdeconvolve

Integrated Object Visualization #37