Closed oligomyeggo closed 2 years ago
Hi @oligomyeggo,
Thanks so much for using STdeconvolve
and thanks for reporting this issue!
It sounds like the order of the cluster and tissue region meta.data assignments you want to color the pixels by with the groups
parameter is in a different order than the pixels in the input theta cell-type proportion matrix. Try ensuring that the order of the pixels is the same for both parameters by doing something like:
vizAllTopics(deconProp,
pos,
groups = annot[rownames(deconProp)],
group_cols = rainbow(length(levels(annot))),
r=0.4)
where annot
is a factor of pixel meta.data assignments you would like to color them by. (Note that this could actually just be a vector as well, but you will need to ensure that the group_cols
parameter is a character vector of colors with a color value for each unique meta.data assignment.)
Let me know if this resolves the issue or if you are still having problems, and if you have any other questions! Brendan
Hi @bmill3r, thanks for your quick response! I am still not able to get the groups
parameter to work. Maybe I am misunderstanding what format annot
should be in? I pulled the relevant columns from my seurat object meta.data, and have an annot data.frame that looks like this:
region seurat_clusters
CCGCCGGTCAACACAC-1_1 Thalamus 6
TAGTTTATTCTTGCTT-1_1 Thalamus 6
GATATCTCATGCAATA-1_1 Thalamus 4
CGTTTAAGCGGAGCAC-1_1 Thalamus 6
CATGCTGGCTCCAATT-1_1 Thalamus 6
GAAACAGCCATGCAGT-1_1 Thalamus 3
And when I check out the order of rownames(deconProp)
, it looks like everything is in the same order:
"CCGCCGGTCAACACAC-1_1"
"TAGTTTATTCTTGCTT-1_1"
"GATATCTCATGCAATA-1_1"
"CGTTTAAGCGGAGCAC-1_1"
"CATGCTGGCTCCAATT-1_1"
"GAAACAGCCATGCAGT-1_1"
To color by region, I am setting groups = annot$region
and group_cols
to be a predefined region_colors
which is a character vector of colors with each color value corresponding to a specific region:
Thalamus Hypothalamus VL FB Cortex RSP, IL DG_mo
"#FFC312" "#C4E538" "#12CBC4" "#FDA7DF" "#ED4C67" "#F79F1F"
Yet, I am still getting a vizAllTopics()
plot that looks like confetti when colored by groups. I noticed that in the provided mOB
data, the mOB$annot
field is a different format than mine (i.e., not a data.frame) and I am wondering if I need to reformat my annot
info somehow to match this?
Hi @oligomyeggo ,
You could try making annot
a factor like it is in the mOB$annot
by doing:
as.factor(annot$region)
but I'm not sure if this is the problem because in the function vizAllTopics()
the group meta.data values are simply appended to an internal data.frame object within the function for plotting via:
theta_ordered_pos$Pixel.Groups <- as.character(groups)
and applying as.character(annot$region)
to a column in the Seurat object should just result in a character vector of values. But it could be worth checking to see if it fixes the plot to be sure.
One more thing to try is to make sure the pixels in the deconProp
and the pos
and the annot$region
inputs are all in the same order. I wonder if in the function when the pixel positions and the pixel cell-type proportions are merged, it changes the order of pixels such that it no longer matches the order of the meta.data values. This is likely something I should address in the function so I definitely appreciate you bringing this up!
Let me know if this helps, Brendan
Hi @bmill3r, thanks again for your quick response!
Making annot
a factor like in mOB$annot
did not fix anything, as you expected. I also tried as.character(annot$region)
, which also didn't work.
I grabbed the rownames
for the deconProp
, pos
, and annot
inputs, stored them as vectors, and then compared them to each other using identical()
(which, correct me if I'm wrong, but that should go through two vectors and return TRUE
only if each element at corresponding indices are exactly the same; otherwise it returns FALSE
). They all match, so all the pixels should be in the same order unless I am missing something?
Thanks again for all of your help Brendan!
Hi @oligomyeggo
Would you be able to show me what the data.frame of the plot looks like by doing something like:
plt <- vizAllTopics(deconProp, pos, groups = annot$region, group_cols = region_colors)
plt$layers[[1]]$data
I'm curious how the region meta.data labels are being assigned to the pixels. If they are being assigned to the correct pixels in this data.frame then the issue probably has something to do with how the group_cols
are being used in the plot.
Thanks for your patience and sorry for this issue! Brendan
Hi @bmill3r,
Sure! Here is what the data.frame looks like running the code from above:
Row.names y x Pixel.Groups Topics value
1 AAACAAGTATCTCCCA-1_1 -490.7 1264.5 Thalamus Topic.1 0
2 AAACACCAATAACTGC-1_1 -1499.4 1454.8 Thalamus Topic.1 0
3 AAACAGCTTTCAGAAG-1_1 -1620.9 1116.3 Thalamus Topic.1 0
4 AAACAGGGTCTATATT-1_1 -1572.4 1200.9 Thalamus Topic.1 0
5 AAACCGGGTAGGTACC-1_1 -1390.0 1095.3 Thalamus Topic.1 0
6 AAACCGTTCGTCCAGG-1_1 -1219.9 1306.7 Thalamus Topic.1 0
I am also attaching the entire data.frame as a .csv file, in case that is helpful. I don't see the group_cols
values being stored anywhere in the plt$layers[[1]]$data
data.frame.
More importantly, the wrong regions are being assigned to the barcodes. Just looking at the first three barcodes listed here, the correct region assignments should be Cortex SSp, Cortex OLF, and Cortex OLF, and instead they have all received the Thalamus annotation. However, the regions are in the same order as in the annot
meta.data table (I grabbed the first length(annot$region)
elements from plt$layers[[1]]$data$Pixel.Groups
and confirmed they are the same using identical()
) - however, the barcodes are not in the correct order (compared to the deconProp
, pos
, and annot
inputs). So I am guessing the barcodes (and just the barcodes) are being reordered somewhere along the way?
Hi @oligomyeggo,
Thanks so much for the output data.frame from the plot object! I've use the file to reconstruct the deconProp
, pos
, and groups = annot$region
input parameters and essentially remade the theta_ordered_pos
internal data.frame that is made within vizAllTopics()
. I then used this with the main ggplot2
plotting functions to try and generate the plot and see if I can identify any issues.
What I noticed is that if I try to plot a subset of pixels separately, let's say just the pixels labeled as "Thalamus", they appear to be randomly positioned in the tissue. This is true for other regions, too. So maybe there is a mix up of the "x" and "y" coordinates, pixel barcodes, and region names in your input data? See my code below:
library(dplyr)
m <- data.frame(read.csv2(file = "./STdeconvolve_vizAllTopics_output.csv", header = TRUE, sep = ",", row.names = 1))
m <- m %>%
dplyr::mutate_at(vars(x, y, value), as.numeric)
## remake the theta matrix based on pixel names as rows, topic names as columns, and values as cells
theta <- reshape2::dcast(m[,c("Row.names", "Topics", "value")], Row.names ~ Topics)
rownames(theta) <- theta$Row.names
## there are 20 topics, first column is "Row.names" so drop
theta <- theta[2:21]
## there are 2721 unique pixels
pos <- m[1:2721,c("x", "y")]
rownames(pos) <- rownames(theta)
## the groups for the pixels
groups <- m[1:2721, "Pixel.Groups"]
Make the theta_ordered_pos
data.frame that is generated within vizAllTopics()
theta_ordered_pos <- merge(data.frame(theta),
data.frame(pos), by=0)
topicColumns <- colnames(theta_ordered_pos)[2:(dim(theta_ordered_pos)[2]-2)]
theta_ordered_pos$Pixel.Groups <- as.character(groups)
head(theta_ordered_pos)
Row.names Topic.1 Topic.10 Topic.11 Topic.12 Topic.13 Topic.14 Topic.15
1 AAACAAGTATCTCCCA-1_1 0 0.06661723 0.00000000 0.00000000 0 0 0.1002632
2 AAACACCAATAACTGC-1_1 0 0.00000000 0.08441205 0.00000000 0 0 0.0000000
3 AAACAGCTTTCAGAAG-1_1 0 0.00000000 0.00000000 0.14573946 0 0 0.0000000
4 AAACAGGGTCTATATT-1_1 0 0.00000000 0.09596800 0.12289874 0 0 0.0000000
5 AAACCGGGTAGGTACC-1_1 0 0.00000000 0.00000000 0.20064300 0 0 0.0000000
6 AAACCGTTCGTCCAGG-1_1 0 0.00000000 0.00000000 0.08965665 0 0 0.1034867
Topic.16 Topic.17 Topic.18 Topic.19 Topic.2 Topic.20 Topic.3 Topic.4 Topic.5
1 0.75374780 0 0 0 0.00000000 0.00000000 0 0.0000000 0.07937179
2 0.07390893 0 0 0 0.55609802 0.00000000 0 0.1236635 0.07265039
3 0.00000000 0 0 0 0.13497279 0.00000000 0 0.0000000 0.00000000
4 0.00000000 0 0 0 0.38459884 0.15910192 0 0.0000000 0.00000000
5 0.00000000 0 0 0 0.00000000 0.08328677 0 0.0000000 0.00000000
6 0.00000000 0 0 0 0.06237671 0.00000000 0 0.0000000 0.00000000
Topic.6 Topic.7 Topic.8 Topic.9 x y Pixel.Groups
1 0.0000000 0 0.0000000 0.00000000 1264.5 -490.7 Thalamus
2 0.0000000 0 0.0000000 0.08926711 1454.8 -1499.4 Thalamus
3 0.5869932 0 0.1322945 0.00000000 1116.3 -1620.9 Thalamus
4 0.2374325 0 0.0000000 0.00000000 1200.9 -1572.4 Thalamus
5 0.5822289 0 0.1338413 0.00000000 1095.3 -1390.0 Thalamus
6 0.3206065 0 0.4238735 0.00000000 1306.7 -1219.9 Thalamus
Get some additional plotting parameters
r <- max(0.4, max(pos)/nrow(pos)*4)
topicCols <- rainbow(20)
group_cols <- c(
"Thalamus" = "#FFC312",
"Hypothalamus" = "#C4E538",
"VL" = "#12CBC4",
"FB" = "#FDA7DF",
"Cortex RSP, IL" = "#ED4C67",
"DG_mo" = "#F79F1F"
)
Just select the Thalamus pixels
thalamus_pixels <- which(theta_ordered_pos$Pixel.Groups == "Thalamus")
plotting (taken from within vizAllTopics()
p <- ggplot2::ggplot() +
ggplot2::theme(
panel.grid = ggplot2::element_blank(),
axis.line = ggplot2::element_blank(),
axis.text.x = ggplot2::element_blank(),
axis.text.y = ggplot2::element_blank(),
axis.ticks = ggplot2::element_blank(),
axis.title.x = ggplot2::element_blank(),
axis.title.y = ggplot2::element_blank(),
panel.background = ggplot2::element_blank(),
plot.background = ggplot2::element_blank(),
legend.text = ggplot2::element_text(size = 12, colour = "black"),
legend.title = ggplot2::element_text(size = 12, colour = "black")
) +
scatterpie::geom_scatterpie(ggplot2::aes(x=x, y=y, group=Row.names, r=2.5, color = Pixel.Groups),
lwd = 1,
data = theta_ordered_pos[thalamus_pixels,],
cols = topicColumns,
legend_name = "Topics") +
ggplot2::scale_fill_manual(values = as.vector(topicCols)) +
ggplot2::scale_color_manual(values = group_cols)
p
FYI if I increase the radius of the pixels r=10
and reduce the line width lwd=0.1
I do see that the deconvolved topics cluster in certain regions of the tissue which is cool!
Let me know if this helps and thanks again for your patience, Brendan
Hi @bmill3r, thanks for all of your effort in this! I am still not quite sure where things are going wrong, though. I followed your code above to also recreate the theta_ordered_pos
data.frame from the same .csv file I sent you. When I do this, and recreate the pos
and groups
input objects, they do not match my initial pos
and groups
objects that I used to create the .csv file (though, I am not sure if they should? The barcodes are in a different order). In case it is helpful, I am attaching the deconProp
, pos
, and annot
input objects I used to generate the plt$layers[[1]]$data
data.frame from above.
So far as mixing up the "x" and "y" coordinates, this is possible, though I have rerun the analysis several times using different "x" and "y" options for the pos
object, and I have never gotten the annotations to work correctly. My starting object is a seurat
object that I processed using the STUtility package. In order to access the coordinates, I retrieved the Staffli
object from the seurat
object. The Staffli
object contains meta.data like pixel coordinates:
x y adj_x adj_y pixel_x pixel_y sample original_x original_y warped_x warped_y
CCGCCGGTCAACACAC-1_1 55 13 55 13 920.2098 461.9857 1 920.2098 461.9857 482.0 1061.8
TAGTTTATTCTTGCTT-1_1 57 13 57 13 944.6028 461.9857 1 944.6028 461.9857 482.0 1037.4
GATATCTCATGCAATA-1_1 58 12 58 12 956.6853 440.8982 1 956.6853 440.8982 460.9 1025.3
CGTTTAAGCGGAGCAC-1_1 59 13 59 13 968.8819 461.9857 1 968.8819 461.9857 482.0 1013.1
CATGCTGGCTCCAATT-1_1 60 12 60 12 981.0784 440.8982 1 981.0784 440.8982 460.9 1000.9
GAAACAGCCATGCAGT-1_1 61 13 61 13 993.1609 461.9857 1 993.1609 461.9857 482.0 988.9
On my original analysis I used the "pixel_x" and "pixel_y" columns, but this did not produce the desired figure. I tried switching the "x" and "y" assignment, which also did not help. I've also tried using the "x" and "y" columns from the Staffli
meta.data, as well as the "warped_x" and "warped_y", and none of those produced the desired figure either. All switching around the x and y info seemed to do was rotate and flip the image. However, no matter what orientation the figure was in, the Topics all looked biologically correct (which, I agree it's awesome to see the deconvoluted topics clustering in a region-specific manner! That is what I would expect and hope to see). The annotations just always end up looking like confetti no matter with meta.data from the Staffli
object I use (and I would expect that, if I mixed up x and y coordinates, wouldn't I still see region annotations clumping together or having some sort of structure?). I am happy to send you my original counts from the seurat
object and the Staffli
meta.data if that would be helpful to look at the data from the beginning; the counts file is just too big to send in GitHub.
Thank you again for all of your help! If I can get these annotations to work if would be amazing for my analysis, so I really appreciate all of your time on this.
Hi @oligomyeggo,
Thank you for sharing the input files - with these I believe I was able to figure out the problem.
Within vizAllTopics()
, the pos
and deconProp
data.frames are merged via:
theta_ordered_pos <- merge(data.frame(theta_ordered),
data.frame(pos), by=0)
However, a consequence of this appears to change the order of pixels rows. For example, the pixels in the input files are in the same order initially:
head(deconProp)
X1 X2 X3 X4 X5 X6 X7 X8
CCGCCGGTCAACACAC-1_1 0 0 0.12382367 0 0 0.1576597 0 0
TAGTTTATTCTTGCTT-1_1 0 0 0.15001793 0 0 0.3353203 0 0
GATATCTCATGCAATA-1_1 0 0 0.12570119 0 0 0.1727980 0 0
CGTTTAAGCGGAGCAC-1_1 0 0 0.13034885 0 0 0.0000000 0 0
CATGCTGGCTCCAATT-1_1 0 0 0.09825906 0 0 0.0000000 0 0
GAAACAGCCATGCAGT-1_1 0 0 0.00000000 0 0 0.0000000 0 0
X9 X10 X11 X12 X13
CCGCCGGTCAACACAC-1_1 0.18925612 0.0000000 0 0.00000000 0
TAGTTTATTCTTGCTT-1_1 0.11122191 0.0000000 0 0.06063743 0
GATATCTCATGCAATA-1_1 0.06067111 0.1445152 0 0.32274893 0
CGTTTAAGCGGAGCAC-1_1 0.06081824 0.4093042 0 0.26545275 0
CATGCTGGCTCCAATT-1_1 0.00000000 0.4781016 0 0.22333077 0
GAAACAGCCATGCAGT-1_1 0.07720253 0.8553822 0 0.06741529 0
X14 X15 X16 X17 X18 X19 X20
CCGCCGGTCAACACAC-1_1 0.1420648 0 0.0000000 0 0 0 0.3871957
TAGTTTATTCTTGCTT-1_1 0.1225994 0 0.0000000 0 0 0 0.2202031
GATATCTCATGCAATA-1_1 0.1735656 0 0.0000000 0 0 0 0.0000000
CGTTTAAGCGGAGCAC-1_1 0.1340760 0 0.0000000 0 0 0 0.0000000
CATGCTGGCTCCAATT-1_1 0.1385907 0 0.0617179 0 0 0 0.0000000
GAAACAGCCATGCAGT-1_1 0.0000000 0 0.0000000 0 0 0 0.0000000
head(pos)
y x
CCGCCGGTCAACACAC-1_1 -1061.8 482.0
TAGTTTATTCTTGCTT-1_1 -1037.4 482.0
GATATCTCATGCAATA-1_1 -1025.3 460.9
CGTTTAAGCGGAGCAC-1_1 -1013.1 482.0
CATGCTGGCTCCAATT-1_1 -1000.9 460.9
GAAACAGCCATGCAGT-1_1 -988.9 482.0
head(annot)
region seurat_clusters
CCGCCGGTCAACACAC-1_1 Thalamus 6
TAGTTTATTCTTGCTT-1_1 Thalamus 6
GATATCTCATGCAATA-1_1 Thalamus 4
CGTTTAAGCGGAGCAC-1_1 Thalamus 6
CATGCTGGCTCCAATT-1_1 Thalamus 6
GAAACAGCCATGCAGT-1_1 Thalamus 3
but after deconProp
(the theta matrix) and pos
(the pixel positions) are merged in the function, the order changes:
theta_ordered_pos <- merge(data.frame(deconProp),
data.frame(pos), by=0)
Row.names X1 X2 X3 X4 X5
1 AAACAAGTATCTCCCA-1_1 0 0.00000000 0 0.0000000 0.07937179
2 AAACACCAATAACTGC-1_1 0 0.55609802 0 0.1236635 0.07265039
3 AAACAGCTTTCAGAAG-1_1 0 0.13497279 0 0.0000000 0.00000000
4 AAACAGGGTCTATATT-1_1 0 0.38459884 0 0.0000000 0.00000000
5 AAACCGGGTAGGTACC-1_1 0 0.00000000 0 0.0000000 0.00000000
6 AAACCGTTCGTCCAGG-1_1 0 0.06237671 0 0.0000000 0.00000000
X6 X7 X8 X9 X10 X11 X12
1 0.0000000 0 0.0000000 0.00000000 0.06661723 0.00000000 0.00000000
2 0.0000000 0 0.0000000 0.08926711 0.00000000 0.08441205 0.00000000
3 0.5869932 0 0.1322945 0.00000000 0.00000000 0.00000000 0.14573946
4 0.2374325 0 0.0000000 0.00000000 0.00000000 0.09596800 0.12289874
5 0.5822289 0 0.1338413 0.00000000 0.00000000 0.00000000 0.20064300
6 0.3206065 0 0.4238735 0.00000000 0.00000000 0.00000000 0.08965665
X13 X14 X15 X16 X17 X18 X19 X20 y x
1 0 0 0.1002632 0.75374780 0 0 0 0.00000000 -490.7 1264.5
2 0 0 0.0000000 0.07390893 0 0 0 0.00000000 -1499.4 1454.8
3 0 0 0.0000000 0.00000000 0 0 0 0.00000000 -1620.9 1116.3
4 0 0 0.0000000 0.00000000 0 0 0 0.15910192 -1572.4 1200.9
5 0 0 0.0000000 0.00000000 0 0 0 0.08328677 -1390.0 1095.3
6 0 0 0.1034867 0.00000000 0 0 0 0.00000000 -1219.9 1306.7
and because the group meta.data region names stays in the same order, the labels become scrambled when plotting.
So to get around this, I ordered the pixels in all three initial files in the order they become after the merge:
df <- merge(data.frame(pos),
data.frame(deconProp), by=0)
head(df)
Row.names y x X1 X2 X3 X4
1 AAACAAGTATCTCCCA-1_1 -490.7 1264.5 0 0.00000000 0 0.0000000
2 AAACACCAATAACTGC-1_1 -1499.4 1454.8 0 0.55609802 0 0.1236635
3 AAACAGCTTTCAGAAG-1_1 -1620.9 1116.3 0 0.13497279 0 0.0000000
4 AAACAGGGTCTATATT-1_1 -1572.4 1200.9 0 0.38459884 0 0.0000000
5 AAACCGGGTAGGTACC-1_1 -1390.0 1095.3 0 0.00000000 0 0.0000000
6 AAACCGTTCGTCCAGG-1_1 -1219.9 1306.7 0 0.06237671 0 0.0000000
X5 X6 X7 X8 X9 X10 X11
1 0.07937179 0.0000000 0 0.0000000 0.00000000 0.06661723 0.00000000
2 0.07265039 0.0000000 0 0.0000000 0.08926711 0.00000000 0.08441205
3 0.00000000 0.5869932 0 0.1322945 0.00000000 0.00000000 0.00000000
4 0.00000000 0.2374325 0 0.0000000 0.00000000 0.00000000 0.09596800
5 0.00000000 0.5822289 0 0.1338413 0.00000000 0.00000000 0.00000000
6 0.00000000 0.3206065 0 0.4238735 0.00000000 0.00000000 0.00000000
X12 X13 X14 X15 X16 X17 X18 X19 X20
1 0.00000000 0 0 0.1002632 0.75374780 0 0 0 0.00000000
2 0.00000000 0 0 0.0000000 0.07390893 0 0 0 0.00000000
3 0.14573946 0 0 0.0000000 0.00000000 0 0 0 0.00000000
4 0.12289874 0 0 0.0000000 0.00000000 0 0 0 0.15910192
5 0.20064300 0 0 0.0000000 0.00000000 0 0 0 0.08328677
6 0.08965665 0 0 0.1034867 0.00000000 0 0 0 0.00000000
annot2 <- annot[df$Row.names,]
pos2 <- pos[df$Row.names,]
deconProp2 <- deconProp[df$Row.names,]
STdeconvolve::vizAllTopics(theta = deconProp2,
pos = pos2,
groups = annot2$region,
group_cols = group_cols,
r = 10,
lwd = 0.5)
In a future release I'll work on having checks to ensure that the order of pixel group labels still matches the pixels even after merging the theta matrix and pixel positions. So this was a really helpful exercise - thanks for bringing this to my attention!
Please reach out if you have any other questions, Brendan
fyi the lwd
and r
can be changed to affect the size of the scatterpies and the thickness of the lines to improve vizualization of the pixel cell-type proportions and their group labels. Additionally, I have some vignettes where you can plot each deconvolved cell-type separately and color the regions of interest as well to provide another way to visualize the information.
Thank you so much @bmill3r!! This is awesome, and a plot like that is exactly what I was hoping to get! I really appreciate you taking the time to help me dig into the annotations issue and get everything working. And thank you for the tips about plotting aesthetics, and for the different plotting options in your vignettes (I did see those and am looking forward to trying them out).
I've updated VizAllTopics()
and vizTopic()
functions to ensure that the pixel rownames stay in the same order even after merging the theta
and pos
matrices. It works now assuming that the groups
vector contains meta.data ids that correspond to the original order of the pixels in theta
and pos
. Will be available in the next version push (0.99.10)
Awesome, thanks so much @bmill3r! I'll look for version 0.99.10 and update my workflows when it's available.
Hi there, and thank you so much for developing such a great package! I have gone through the tutorial with my own data and everything is working well except for the
groups
parameter of thevizAllTopics()
function. I have two different columns of meta.data that I would like to color code the scatterpies by: 1) cluster and 2) tissue region (like shown in this tutorial). However, when I attempt to color code my scatterpies by either of these meta.data values, the resulting plot is wrong and it looks like the annotations are being assigned to the completely wrong spots (this is especially obvious when looking at my tissue region annotations, which should be grouped together throughout the plot but are instead scattered like confetti). The resulting Topics that I get back look correct, and seem to be identifying known cell types in the correct spots. It's just thegroups
annotation that is not working. If this is a user issue, do you have any thoughts or advice on where I might need to tweak the code? I have followed everything according to the tutorials thus far.