Closed BioAmelie closed 4 years ago
Hi @BioAmelie,
great to hear that you are exploring stereoscope
! Judging from the command you posted, it seems that you are running with the default number of epochs which is set to 20000
in both steps - I would probably recommend you to increase this to at least 50000
if you are working with Visium data, and maybe even more. My advice would be to check if your system have converged (see Section "3.2 Monitoring progress" in the README), or whether it needs to be run for a longer time. If you don't want to restart the whole process, you can use the -scm
command (to stereoscope run -h
for more info) to continue the fitting of an already existing model.
The logits.tag.tsv
and R.tag.tsv
files are not really useful for assessing the state of your system or whether you results are correct; these are the rate (R) and log odds (logits, a different way of describing the success probabilities) in the negative binomial distribution - which is the underlying statistical model that is used during the inference. We use the exact same parametrization as the PyTorch implementation of the Negative Binomial. For more information regarding the rates and success probabilities and what they represent in the model I would refer to the bioRxiv pre-print where this is thoroughly described in the Methods sections.
The top n most highly expressed genes
are taken as those with highest total sum across all cells in the single cell data. You can definitely "spike" you analysis by specifying a custom set of genes to be analyzed, this has showed good results in other studies, e.g. this one. However "only" using the marker genes is something I haven't tried, but would be slightly reluctant to try unless this list is fairly large. The way you do this - specify a custom gene list - is by creating a txt
file where all the genes you want to include in the analysis are listed one per row, then in you analysis use -gl GENELIST.txt
and stereoscope
will use these genes.
Good luck with the continued analysis! Alma
Hi @almaan,
Sorry for my later reply. I will follow your suggestion.
minfang
Hello @BioAmelie,
hope things work out for you, if you feel as if your questions have been answered, I would ask you to close this issue. Of course, if you want to continue the discussion, you may leave it open.
Best Alma
Hi @almaan,
My ST data is from mouse lung, can you tell me what I should keep in mind when I select a custom set of genes to be analyzed? I want to combine cell type marker gene and highly expressed genes expect ribosome and mitochondria gene, do you think is it feasible?
Sounds like a great start - as eluded to above - we constructed a similar custom list when analyzing breast cancer data with some really promising results. Also make sure the system converges, otherwise the mapping will not be optimal!
Best of luck Alma
Hi @almaan,
I have successfully run stereoscope, and this is my code
stereoscope run --sc_cnt ../All_ko3_celltype_cnt.tsv --sc_labels ../All_ko3_celltype_meta.tsv -o 3170_v2 --st_cnt ../st_cnt.tsv --gpu -mc 10 -stb 2048 -scb 2048 -n 5000
, but theW.2020-07-17072307.309946.tsv
is odd, for example, some cell type should not have so high cell proportion. Therefore, I want to understand these output files to evaluate whether the output is right. Can you tell me the meaning oflogits.2020-07-17072307.309946.tsv
andR.2020-07-17072307.309946.tsv
and which parameters you recommend to tune to get a more reliable result? What's more, I am intended to filter ribosome and mitochondria genes and only input marker genes for single-cell sequencing cell types, do you think it is feasible? I am also confused about how do you definetop n most highly expressed genes
, does the most highly expressed gene represent those genes that have higher mean value among cells?