JEFworks-Lab / STdeconvolve

Reference-free cell-type deconvolution of multi-cellular spatially resolved transcriptomics data
http://jef.works/STdeconvolve/
99 stars 11 forks source link

Suggestions to improve user experienced #1

Open JEFworks opened 3 years ago

JEFworks commented 3 years ago

As I'm using STdeconvolve on new datasets, here are some enhancements that I believe will help improve the user experience. This is a running list. Please feel free to add and check off as needed.

JEFworks commented 2 years ago
bmill3r commented 2 years ago

Potential speed ups?

UPDATE: the rare cell-type computations, perplexity, etc take a while if they are computed using a new corpus. But because we are interested in the same corpus we fit an LDA model too, we don't need to indicate a new corpus. So this speeds things up a little bit.

JPingLin commented 2 years ago

Maybe a progress bar when running vizAllTopics? I was plotting ~3500 spots (per visium square), it takes more than 5 min for image to show. At some point I was wondering if my R froze or it is the normal behavior.

bmill3r commented 2 years ago

Hi JPingLin,

Thanks for the suggestion! Yes - I have noticed that vizAllTopics can take a while especially if there are a lot of pixels and cell-types to plot. Most likely it is because of all the individual scatterpie charts ggplot2 has to make. A progress bar would be useful for this purpose. I will see if I can incorporate something. Perhaps a suggestion for now could be to plot sections of the entire square separately and make sure that colors for each of the deconvolved cell-types in the theta proportion matrix are explicitly stated in the topicCols parameter just in case a given cell-type is not present in the section of pixels being plotted.

Let me know if you have any other questions or suggestion, Brendan

JPingLin commented 2 years ago

Hi Brendan, thanks for the great tool, the installation was smooth and error free off the bat! I have one question and one suggestion: For the step "remove genes present in 5% or less of pixel", will this remove the highly specific genes to a population that is known to be presented in less than 5% of the brain cells? For example, some genes are unique to ependyma/vasculature, and their present in the sampled brain is lower than 3500*0.05 = 175 spots, will they get excluded completely? Or maybe my understanding of this step is not correct.

One suggestion, I think it will be useful to incorporate a function to flip the coordinates easily in plots. I know this might be related to the issue of (0, 0) starting from upper left, or lower left corner in axis from different program. And might be related to how initially pixel/spot data was prepared coming out from specific platform. Right now the plots are always upside down for me if using visium output.

bmill3r commented 2 years ago

Hi JPingLin,

Your understanding is correct - by default, genes present in less than 5% of the total pixels in a given dataset will be removed and not included in the final corpus used as input into STdeconvolve. The motivation behind this filtering step is to remove genes that were poorly captured across pixels in the ST experiment, and may not be accurately assigned to clusters of tightly occurring and non-overlapping expressed genes. Depending on the dataset, however, 5% can actually represent a large number of pixels, and so perhaps a lower threshold can also be appropriate, especially if the goal is to identify and include overdispersed genes that may be marking rare cell-types.

Using restrictCorpus() to filter the counts matrix into the final input corpus, the thresholds for the number of pixels can be selected by changing the parameters removeAbove and removeBelow. For example:

inputCorpus <- restrictCorpus(counts,
                              removeAbove = 1.0,
                              removeBelow = 0.05
                             )

where removeBelow in this case removes genes present in less than 5% of pixels.

Alternatively, you can also use preprocess to filter the starting counts matrix into the input corpus:

preprocess(dat,
          selected.genes = NA,
          nTopGenes = NA,
          genes.to.remove = NA,
          removeAbove = NA,
          removeBelow = NA,
          min.reads = 1,
          min.lib.size = 1,
          min.detected = 1,
          ODgenes = TRUE,
          nTopOD = 1000,
          verbose = TRUE
          )

If there is a list of cell marker genes you would like to include in addition to the overdispersed genes, you could first feature select for the overdispersed genes using restrictCorpus, and then apply preprocess to the original counts matrix, using the list of overdispersed genes found via restrictCorpus() plus additional marker genes. For example:

inputCorpus <- restrictCorpus(counts,
                             removeAbove = 1.0,
                             removeBelow = 0.05
                             )

inputCorpus <- preprocess(dat,
                           selected.genes = c(rownames(inputCorpus), c(markerGenes) ),
                           nTopGenes = NA,
                           genes.to.remove = NA,
                           removeAbove = NA,
                           removeBelow = NA,
                           min.reads = 1,
                           min.lib.size = 1,
                           min.detected = 1,
                           ODgenes = FALSE,
                           nTopOD = 1000,
                           verbose = TRUE)

This is a lot of information, so let me know if any of this doesn't make sense or you have additional questions.

bmill3r commented 2 years ago

One suggestion, I think it will be useful to incorporate a function to flip the coordinates easily in plots. I know this might be related to the issue of (0, 0) starting from upper left, or lower left corner in axis from different program. And might be related to how initially pixel/spot data was prepared coming out from specific platform. Right now the plots are always upside down for me if using visium output.

This is definitely a good idea and it is most likely an issue with the relative placement of (0,0) with respect to the original image and the plotting coordinate system used in R. Will see if I can come up with a simple function to transform. In the meantime, one could do something like this:

pos[, "y"] <- pos[, "y"] * - 1

to essentially flip the plotted pixels upside down. Conversely, you could do the same thing with the x-coordinates of the pixels.