vizAllTopics "groups" parameter not working correctly

Hi there, and thank you so much for developing such a great package! I have gone through the tutorial with my own data and everything is working well except for the groups parameter of the vizAllTopics() function. I have two different columns of meta.data that I would like to color code the scatterpies by: 1) cluster and 2) tissue region (like shown in this tutorial). However, when I attempt to color code my scatterpies by either of these meta.data values, the resulting plot is wrong and it looks like the annotations are being assigned to the completely wrong spots (this is especially obvious when looking at my tissue region annotations, which should be grouped together throughout the plot but are instead scattered like confetti). The resulting Topics that I get back look correct, and seem to be identifying known cell types in the correct spots. It's just the groups annotation that is not working. If this is a user issue, do you have any thoughts or advice on where I might need to tweak the code? I have followed everything according to the tutorials thus far.

Hi @oligomyeggo,

Thanks so much for using STdeconvolve and thanks for reporting this issue!

It sounds like the order of the cluster and tissue region meta.data assignments you want to color the pixels by with the groups parameter is in a different order than the pixels in the input theta cell-type proportion matrix. Try ensuring that the order of the pixels is the same for both parameters by doing something like:

vizAllTopics(deconProp,
             pos, 
             groups = annot[rownames(deconProp)], 
             group_cols = rainbow(length(levels(annot))),
             r=0.4)

where annot is a factor of pixel meta.data assignments you would like to color them by. (Note that this could actually just be a vector as well, but you will need to ensure that the group_cols parameter is a character vector of colors with a color value for each unique meta.data assignment.)

Let me know if this resolves the issue or if you are still having problems, and if you have any other questions! Brendan

Hi @bmill3r, thanks for your quick response! I am still not able to get the groups parameter to work. Maybe I am misunderstanding what format annot should be in? I pulled the relevant columns from my seurat object meta.data, and have an annot data.frame that looks like this:

                       region seurat_clusters
CCGCCGGTCAACACAC-1_1 Thalamus               6
TAGTTTATTCTTGCTT-1_1 Thalamus               6
GATATCTCATGCAATA-1_1 Thalamus               4
CGTTTAAGCGGAGCAC-1_1 Thalamus               6
CATGCTGGCTCCAATT-1_1 Thalamus               6
GAAACAGCCATGCAGT-1_1 Thalamus               3

And when I check out the order of rownames(deconProp), it looks like everything is in the same order:

"CCGCCGGTCAACACAC-1_1" 
"TAGTTTATTCTTGCTT-1_1" 
"GATATCTCATGCAATA-1_1" 
"CGTTTAAGCGGAGCAC-1_1" 
"CATGCTGGCTCCAATT-1_1" 
"GAAACAGCCATGCAGT-1_1"

To color by region, I am setting groups = annot$region and group_cols to be a predefined region_colors which is a character vector of colors with each color value corresponding to a specific region:

      Thalamus   Hypothalamus             VL             FB Cortex RSP, IL          DG_mo 
     "#FFC312"      "#C4E538"      "#12CBC4"      "#FDA7DF"      "#ED4C67"      "#F79F1F"

Yet, I am still getting a vizAllTopics() plot that looks like confetti when colored by groups. I noticed that in the provided mOB data, the mOB$annot field is a different format than mine (i.e., not a data.frame) and I am wondering if I need to reformat my annot info somehow to match this?

Hi @oligomyeggo ,

You could try making annot a factor like it is in the mOB$annot by doing:

as.factor(annot$region)

but I'm not sure if this is the problem because in the function vizAllTopics() the group meta.data values are simply appended to an internal data.frame object within the function for plotting via:

theta_ordered_pos$Pixel.Groups <- as.character(groups)

and applying as.character(annot$region) to a column in the Seurat object should just result in a character vector of values. But it could be worth checking to see if it fixes the plot to be sure.

One more thing to try is to make sure the pixels in the deconProp and the pos and the annot$region inputs are all in the same order. I wonder if in the function when the pixel positions and the pixel cell-type proportions are merged, it changes the order of pixels such that it no longer matches the order of the meta.data values. This is likely something I should address in the function so I definitely appreciate you bringing this up!

Let me know if this helps, Brendan

Hi @bmill3r, thanks again for your quick response!

Making annot a factor like in mOB$annot did not fix anything, as you expected. I also tried as.character(annot$region), which also didn't work.

I grabbed the rownames for the deconProp, pos, and annot inputs, stored them as vectors, and then compared them to each other using identical() (which, correct me if I'm wrong, but that should go through two vectors and return TRUE only if each element at corresponding indices are exactly the same; otherwise it returns FALSE). They all match, so all the pixels should be in the same order unless I am missing something?

Thanks again for all of your help Brendan!

Hi @oligomyeggo

Would you be able to show me what the data.frame of the plot looks like by doing something like:

plt <- vizAllTopics(deconProp, pos, groups = annot$region, group_cols = region_colors)

plt$layers[[1]]$data

I'm curious how the region meta.data labels are being assigned to the pixels. If they are being assigned to the correct pixels in this data.frame then the issue probably has something to do with how the group_cols are being used in the plot.

Thanks for your patience and sorry for this issue! Brendan

Hi @bmill3r,

Sure! Here is what the data.frame looks like running the code from above:

             Row.names       y      x Pixel.Groups  Topics value
1 AAACAAGTATCTCCCA-1_1  -490.7 1264.5     Thalamus Topic.1     0
2 AAACACCAATAACTGC-1_1 -1499.4 1454.8     Thalamus Topic.1     0
3 AAACAGCTTTCAGAAG-1_1 -1620.9 1116.3     Thalamus Topic.1     0
4 AAACAGGGTCTATATT-1_1 -1572.4 1200.9     Thalamus Topic.1     0
5 AAACCGGGTAGGTACC-1_1 -1390.0 1095.3     Thalamus Topic.1     0
6 AAACCGTTCGTCCAGG-1_1 -1219.9 1306.7     Thalamus Topic.1     0

I am also attaching the entire data.frame as a .csv file, in case that is helpful. I don't see the group_cols values being stored anywhere in the plt$layers[[1]]$data data.frame.

More importantly, the wrong regions are being assigned to the barcodes. Just looking at the first three barcodes listed here, the correct region assignments should be Cortex SSp, Cortex OLF, and Cortex OLF, and instead they have all received the Thalamus annotation. However, the regions are in the same order as in the annot meta.data table (I grabbed the first length(annot$region) elements from plt$layers[[1]]$data$Pixel.Groups and confirmed they are the same using identical()) - however, the barcodes are not in the correct order (compared to the deconProp, pos, and annot inputs). So I am guessing the barcodes (and just the barcodes) are being reordered somewhere along the way?

STdeconvolve_vizAllTopics_output.csv

Hi @oligomyeggo,

Thanks so much for the output data.frame from the plot object! I've use the file to reconstruct the deconProp, pos, and groups = annot$region input parameters and essentially remade the theta_ordered_pos internal data.frame that is made within vizAllTopics(). I then used this with the main ggplot2 plotting functions to try and generate the plot and see if I can identify any issues.

What I noticed is that if I try to plot a subset of pixels separately, let's say just the pixels labeled as "Thalamus", they appear to be randomly positioned in the tissue. This is true for other regions, too. So maybe there is a mix up of the "x" and "y" coordinates, pixel barcodes, and region names in your input data? See my code below:

library(dplyr)
m <- data.frame(read.csv2(file = "./STdeconvolve_vizAllTopics_output.csv", header = TRUE, sep = ",", row.names = 1))
m <- m %>% 
  dplyr::mutate_at(vars(x, y, value), as.numeric)

## remake the theta matrix based on pixel names as rows, topic names as columns, and values as cells
theta <- reshape2::dcast(m[,c("Row.names", "Topics", "value")], Row.names ~ Topics)
rownames(theta) <- theta$Row.names
## there are 20 topics, first column is "Row.names" so drop
theta <- theta[2:21]

## there are 2721 unique pixels
pos <- m[1:2721,c("x", "y")]
rownames(pos) <- rownames(theta)

## the groups for the pixels
groups <- m[1:2721, "Pixel.Groups"]

Make the theta_ordered_pos data.frame that is generated within vizAllTopics()

theta_ordered_pos <- merge(data.frame(theta),
                             data.frame(pos), by=0)
topicColumns <- colnames(theta_ordered_pos)[2:(dim(theta_ordered_pos)[2]-2)]
theta_ordered_pos$Pixel.Groups <- as.character(groups)
head(theta_ordered_pos)

             Row.names Topic.1   Topic.10   Topic.11   Topic.12 Topic.13 Topic.14  Topic.15
1 AAACAAGTATCTCCCA-1_1       0 0.06661723 0.00000000 0.00000000        0        0 0.1002632
2 AAACACCAATAACTGC-1_1       0 0.00000000 0.08441205 0.00000000        0        0 0.0000000
3 AAACAGCTTTCAGAAG-1_1       0 0.00000000 0.00000000 0.14573946        0        0 0.0000000
4 AAACAGGGTCTATATT-1_1       0 0.00000000 0.09596800 0.12289874        0        0 0.0000000
5 AAACCGGGTAGGTACC-1_1       0 0.00000000 0.00000000 0.20064300        0        0 0.0000000
6 AAACCGTTCGTCCAGG-1_1       0 0.00000000 0.00000000 0.08965665        0        0 0.1034867
    Topic.16 Topic.17 Topic.18 Topic.19    Topic.2   Topic.20 Topic.3   Topic.4    Topic.5
1 0.75374780        0        0        0 0.00000000 0.00000000       0 0.0000000 0.07937179
2 0.07390893        0        0        0 0.55609802 0.00000000       0 0.1236635 0.07265039
3 0.00000000        0        0        0 0.13497279 0.00000000       0 0.0000000 0.00000000
4 0.00000000        0        0        0 0.38459884 0.15910192       0 0.0000000 0.00000000
5 0.00000000        0        0        0 0.00000000 0.08328677       0 0.0000000 0.00000000
6 0.00000000        0        0        0 0.06237671 0.00000000       0 0.0000000 0.00000000
    Topic.6 Topic.7   Topic.8    Topic.9      x       y Pixel.Groups
1 0.0000000       0 0.0000000 0.00000000 1264.5  -490.7     Thalamus
2 0.0000000       0 0.0000000 0.08926711 1454.8 -1499.4     Thalamus
3 0.5869932       0 0.1322945 0.00000000 1116.3 -1620.9     Thalamus
4 0.2374325       0 0.0000000 0.00000000 1200.9 -1572.4     Thalamus
5 0.5822289       0 0.1338413 0.00000000 1095.3 -1390.0     Thalamus
6 0.3206065       0 0.4238735 0.00000000 1306.7 -1219.9     Thalamus

Get some additional plotting parameters

r <- max(0.4, max(pos)/nrow(pos)*4)
topicCols <- rainbow(20)
group_cols <- c(
  "Thalamus" = "#FFC312",
  "Hypothalamus" = "#C4E538",
  "VL" = "#12CBC4",
  "FB" = "#FDA7DF",
  "Cortex RSP, IL" = "#ED4C67",
  "DG_mo" = "#F79F1F"
)

Just select the Thalamus pixels

thalamus_pixels <- which(theta_ordered_pos$Pixel.Groups == "Thalamus")

plotting (taken from within vizAllTopics()

p <- ggplot2::ggplot() +
      ggplot2::theme(
        panel.grid = ggplot2::element_blank(),
        axis.line = ggplot2::element_blank(),
        axis.text.x = ggplot2::element_blank(),
        axis.text.y = ggplot2::element_blank(),
        axis.ticks = ggplot2::element_blank(),
        axis.title.x = ggplot2::element_blank(),
        axis.title.y = ggplot2::element_blank(),
        panel.background = ggplot2::element_blank(),
        plot.background = ggplot2::element_blank(),
        legend.text = ggplot2::element_text(size = 12, colour = "black"),
        legend.title = ggplot2::element_text(size = 12, colour = "black")
        ) +
      scatterpie::geom_scatterpie(ggplot2::aes(x=x, y=y, group=Row.names, r=2.5, color = Pixel.Groups),
                                  lwd = 1,
                                  data = theta_ordered_pos[thalamus_pixels,],
                                  cols = topicColumns,
                                  legend_name = "Topics") +
      ggplot2::scale_fill_manual(values = as.vector(topicCols)) +
      ggplot2::scale_color_manual(values = group_cols)
p

thalamus_pixels

FYI if I increase the radius of the pixels r=10 and reduce the line width lwd=0.1 I do see that the deconvolved topics cluster in certain regions of the tissue which is cool!

full_with_topics

Let me know if this helps and thanks again for your patience, Brendan

Hi @bmill3r, thanks for all of your effort in this! I am still not quite sure where things are going wrong, though. I followed your code above to also recreate the theta_ordered_pos data.frame from the same .csv file I sent you. When I do this, and recreate the pos and groups input objects, they do not match my initial pos and groups objects that I used to create the .csv file (though, I am not sure if they should? The barcodes are in a different order). In case it is helpful, I am attaching the deconProp, pos, and annot input objects I used to generate the plt$layers[[1]]$data data.frame from above.

So far as mixing up the "x" and "y" coordinates, this is possible, though I have rerun the analysis several times using different "x" and "y" options for the pos object, and I have never gotten the annotations to work correctly. My starting object is a seurat object that I processed using the STUtility package. In order to access the coordinates, I retrieved the Staffli object from the seurat object. The Staffli object contains meta.data like pixel coordinates:

                      x  y adj_x adj_y  pixel_x  pixel_y sample original_x original_y warped_x warped_y
CCGCCGGTCAACACAC-1_1 55 13    55    13 920.2098 461.9857      1   920.2098   461.9857    482.0   1061.8
TAGTTTATTCTTGCTT-1_1 57 13    57    13 944.6028 461.9857      1   944.6028   461.9857    482.0   1037.4
GATATCTCATGCAATA-1_1 58 12    58    12 956.6853 440.8982      1   956.6853   440.8982    460.9   1025.3
CGTTTAAGCGGAGCAC-1_1 59 13    59    13 968.8819 461.9857      1   968.8819   461.9857    482.0   1013.1
CATGCTGGCTCCAATT-1_1 60 12    60    12 981.0784 440.8982      1   981.0784   440.8982    460.9   1000.9
GAAACAGCCATGCAGT-1_1 61 13    61    13 993.1609 461.9857      1   993.1609   461.9857    482.0    988.9

On my original analysis I used the "pixel_x" and "pixel_y" columns, but this did not produce the desired figure. I tried switching the "x" and "y" assignment, which also did not help. I've also tried using the "x" and "y" columns from the Staffli meta.data, as well as the "warped_x" and "warped_y", and none of those produced the desired figure either. All switching around the x and y info seemed to do was rotate and flip the image. However, no matter what orientation the figure was in, the Topics all looked biologically correct (which, I agree it's awesome to see the deconvoluted topics clustering in a region-specific manner! That is what I would expect and hope to see). The annotations just always end up looking like confetti no matter with meta.data from the Staffli object I use (and I would expect that, if I mixed up x and y coordinates, wouldn't I still see region annotations clumping together or having some sort of structure?). I am happy to send you my original counts from the seurat object and the Staffli meta.data if that would be helpful to look at the data from the beginning; the counts file is just too big to send in GitHub.

Thank you again for all of your help! If I can get these annotations to work if would be amazing for my analysis, so I really appreciate all of your time on this.

annot.csv deconProp.csv pos.csv

Hi @oligomyeggo,

Thank you for sharing the input files - with these I believe I was able to figure out the problem.

Within vizAllTopics(), the pos and deconProp data.frames are merged via:

theta_ordered_pos <- merge(data.frame(theta_ordered),
                           data.frame(pos), by=0)

However, a consequence of this appears to change the order of pixels rows. For example, the pixels in the input files are in the same order initially:

head(deconProp)

                     X1 X2         X3 X4 X5        X6 X7 X8
CCGCCGGTCAACACAC-1_1  0  0 0.12382367  0  0 0.1576597  0  0
TAGTTTATTCTTGCTT-1_1  0  0 0.15001793  0  0 0.3353203  0  0
GATATCTCATGCAATA-1_1  0  0 0.12570119  0  0 0.1727980  0  0
CGTTTAAGCGGAGCAC-1_1  0  0 0.13034885  0  0 0.0000000  0  0
CATGCTGGCTCCAATT-1_1  0  0 0.09825906  0  0 0.0000000  0  0
GAAACAGCCATGCAGT-1_1  0  0 0.00000000  0  0 0.0000000  0  0
                             X9       X10 X11        X12 X13
CCGCCGGTCAACACAC-1_1 0.18925612 0.0000000   0 0.00000000   0
TAGTTTATTCTTGCTT-1_1 0.11122191 0.0000000   0 0.06063743   0
GATATCTCATGCAATA-1_1 0.06067111 0.1445152   0 0.32274893   0
CGTTTAAGCGGAGCAC-1_1 0.06081824 0.4093042   0 0.26545275   0
CATGCTGGCTCCAATT-1_1 0.00000000 0.4781016   0 0.22333077   0
GAAACAGCCATGCAGT-1_1 0.07720253 0.8553822   0 0.06741529   0
                           X14 X15       X16 X17 X18 X19       X20
CCGCCGGTCAACACAC-1_1 0.1420648   0 0.0000000   0   0   0 0.3871957
TAGTTTATTCTTGCTT-1_1 0.1225994   0 0.0000000   0   0   0 0.2202031
GATATCTCATGCAATA-1_1 0.1735656   0 0.0000000   0   0   0 0.0000000
CGTTTAAGCGGAGCAC-1_1 0.1340760   0 0.0000000   0   0   0 0.0000000
CATGCTGGCTCCAATT-1_1 0.1385907   0 0.0617179   0   0   0 0.0000000
GAAACAGCCATGCAGT-1_1 0.0000000   0 0.0000000   0   0   0 0.0000000

head(pos)

                           y     x
CCGCCGGTCAACACAC-1_1 -1061.8 482.0
TAGTTTATTCTTGCTT-1_1 -1037.4 482.0
GATATCTCATGCAATA-1_1 -1025.3 460.9
CGTTTAAGCGGAGCAC-1_1 -1013.1 482.0
CATGCTGGCTCCAATT-1_1 -1000.9 460.9
GAAACAGCCATGCAGT-1_1  -988.9 482.0

head(annot)

                       region seurat_clusters
CCGCCGGTCAACACAC-1_1 Thalamus               6
TAGTTTATTCTTGCTT-1_1 Thalamus               6
GATATCTCATGCAATA-1_1 Thalamus               4
CGTTTAAGCGGAGCAC-1_1 Thalamus               6
CATGCTGGCTCCAATT-1_1 Thalamus               6
GAAACAGCCATGCAGT-1_1 Thalamus               3

but after deconProp (the theta matrix) and pos (the pixel positions) are merged in the function, the order changes:

theta_ordered_pos <- merge(data.frame(deconProp),
                             data.frame(pos), by=0)

             Row.names X1         X2 X3        X4         X5
1 AAACAAGTATCTCCCA-1_1  0 0.00000000  0 0.0000000 0.07937179
2 AAACACCAATAACTGC-1_1  0 0.55609802  0 0.1236635 0.07265039
3 AAACAGCTTTCAGAAG-1_1  0 0.13497279  0 0.0000000 0.00000000
4 AAACAGGGTCTATATT-1_1  0 0.38459884  0 0.0000000 0.00000000
5 AAACCGGGTAGGTACC-1_1  0 0.00000000  0 0.0000000 0.00000000
6 AAACCGTTCGTCCAGG-1_1  0 0.06237671  0 0.0000000 0.00000000
         X6 X7        X8         X9        X10        X11        X12
1 0.0000000  0 0.0000000 0.00000000 0.06661723 0.00000000 0.00000000
2 0.0000000  0 0.0000000 0.08926711 0.00000000 0.08441205 0.00000000
3 0.5869932  0 0.1322945 0.00000000 0.00000000 0.00000000 0.14573946
4 0.2374325  0 0.0000000 0.00000000 0.00000000 0.09596800 0.12289874
5 0.5822289  0 0.1338413 0.00000000 0.00000000 0.00000000 0.20064300
6 0.3206065  0 0.4238735 0.00000000 0.00000000 0.00000000 0.08965665
  X13 X14       X15        X16 X17 X18 X19        X20       y      x
1   0   0 0.1002632 0.75374780   0   0   0 0.00000000  -490.7 1264.5
2   0   0 0.0000000 0.07390893   0   0   0 0.00000000 -1499.4 1454.8
3   0   0 0.0000000 0.00000000   0   0   0 0.00000000 -1620.9 1116.3
4   0   0 0.0000000 0.00000000   0   0   0 0.15910192 -1572.4 1200.9
5   0   0 0.0000000 0.00000000   0   0   0 0.08328677 -1390.0 1095.3
6   0   0 0.1034867 0.00000000   0   0   0 0.00000000 -1219.9 1306.7

and because the group meta.data region names stays in the same order, the labels become scrambled when plotting.

So to get around this, I ordered the pixels in all three initial files in the order they become after the merge:

df <- merge(data.frame(pos),
            data.frame(deconProp), by=0)

head(df)

             Row.names       y      x X1         X2 X3        X4
1 AAACAAGTATCTCCCA-1_1  -490.7 1264.5  0 0.00000000  0 0.0000000
2 AAACACCAATAACTGC-1_1 -1499.4 1454.8  0 0.55609802  0 0.1236635
3 AAACAGCTTTCAGAAG-1_1 -1620.9 1116.3  0 0.13497279  0 0.0000000
4 AAACAGGGTCTATATT-1_1 -1572.4 1200.9  0 0.38459884  0 0.0000000
5 AAACCGGGTAGGTACC-1_1 -1390.0 1095.3  0 0.00000000  0 0.0000000
6 AAACCGTTCGTCCAGG-1_1 -1219.9 1306.7  0 0.06237671  0 0.0000000
          X5        X6 X7        X8         X9        X10        X11
1 0.07937179 0.0000000  0 0.0000000 0.00000000 0.06661723 0.00000000
2 0.07265039 0.0000000  0 0.0000000 0.08926711 0.00000000 0.08441205
3 0.00000000 0.5869932  0 0.1322945 0.00000000 0.00000000 0.00000000
4 0.00000000 0.2374325  0 0.0000000 0.00000000 0.00000000 0.09596800
5 0.00000000 0.5822289  0 0.1338413 0.00000000 0.00000000 0.00000000
6 0.00000000 0.3206065  0 0.4238735 0.00000000 0.00000000 0.00000000
         X12 X13 X14       X15        X16 X17 X18 X19        X20
1 0.00000000   0   0 0.1002632 0.75374780   0   0   0 0.00000000
2 0.00000000   0   0 0.0000000 0.07390893   0   0   0 0.00000000
3 0.14573946   0   0 0.0000000 0.00000000   0   0   0 0.00000000
4 0.12289874   0   0 0.0000000 0.00000000   0   0   0 0.15910192
5 0.20064300   0   0 0.0000000 0.00000000   0   0   0 0.08328677
6 0.08965665   0   0 0.1034867 0.00000000   0   0   0 0.00000000

annot2 <- annot[df$Row.names,]
pos2 <- pos[df$Row.names,]
deconProp2 <- deconProp[df$Row.names,]

STdeconvolve::vizAllTopics(theta = deconProp2,
                          pos = pos2,
                          groups = annot2$region,
                          group_cols = group_cols,
                          r = 10,
                          lwd = 0.5)

regions_and_topics_fixed

In a future release I'll work on having checks to ensure that the order of pixel group labels still matches the pixels even after merging the theta matrix and pixel positions. So this was a really helpful exercise - thanks for bringing this to my attention!

Please reach out if you have any other questions, Brendan

fyi the lwd and r can be changed to affect the size of the scatterpies and the thickness of the lines to improve vizualization of the pixel cell-type proportions and their group labels. Additionally, I have some vignettes where you can plot each deconvolved cell-type separately and color the regions of interest as well to provide another way to visualize the information.

Thank you so much @bmill3r!! This is awesome, and a plot like that is exactly what I was hoping to get! I really appreciate you taking the time to help me dig into the annotations issue and get everything working. And thank you for the tips about plotting aesthetics, and for the different plotting options in your vignettes (I did see those and am looking forward to trying them out).

I've updated VizAllTopics() and vizTopic() functions to ensure that the pixel rownames stay in the same order even after merging the theta and pos matrices. It works now assuming that the groups vector contains meta.data ids that correspond to the original order of the pixels in theta and pos. Will be available in the next version push (0.99.10)

Awesome, thanks so much @bmill3r! I'll look for version 0.99.10 and update my workflows when it's available.

JEFworks-Lab / STdeconvolve

vizAllTopics "groups" parameter not working correctly #12