Closed KevinMarinus closed 7 months ago
Hey!
You can compare your input gene list to the specificity matrix for the reference dataset to see the specificity values for each gene in a cell type. This should give you an idea. Note that EWCE can account for gene length and GC content (EWCE::bootstrap_enrichment_test(geneSizeControl)
) which you won't get insight into when looking at the specificity matrix.
Here is how to get the specificity matrix:
ctd <- ewceData::ctd()
ctd[[1]]$specificity
You could then sort and see where your genes fall.
Thanks, Alan.
@KevinMarinus I think your question should be answered here:
Thanks for your quick and helpful replies. Indeed, the qqplot with gene annotation from "generate_bootstrap_plots" contains the information that I'm searching for. The problem is that the annotations are too big and overlap each other making them unfortunately unreadable (see attachment). I tried to extract the annotations using "sjlabelled" and their "get_labels" function but that didn't work either. Is there a way to adjust the labels in the figure to make them readable and/or to extract the labels (with the criteria for which gene is labelled)?
So we don't have that functionality coded yet but you can get at the ggplot object for any of the plots you like, see below:
## Load the single cell data
sct_data <- ewceData::ctd()
## Set the parameters for the analysis
## Use 5 bootstrap lists for speed, for publishable analysis use >10000
reps <- 5
## Load the gene list and get human orthologs
hits <- ewceData::example_genelist()[1:100]
## Bootstrap significance test,
## no control for transcript length or GC content
## Use pre-computed results to speed up example
full_results <- EWCE::example_bootstrap_results()
output <- EWCE::generate_bootstrap_plots(
sct_data = sct_data,
hits = hits,
reps = reps,
full_results = full_results,
listFileName = "Example",
sctSpecies = "mouse",
genelistSpecies = "human",
annotLevel = 1,
save_dir = tempdir()
)
#get ggplot object for plot 2
ggplot_obj <- output$plots$plot2
Then you can get the data for the plot (for example) to see the values for all annotated genes:
output$plots$plot2$data
You can then remove labels for all but the top genes and then replot if you wanted.
See the function code here which references the functions for each plot: https://github.com/NathanSkene/EWCE/blob/master/R/generate_bootstrap_plots_for_transcriptome.r
Thanks again! This helped me a lot already.
I have two other issues with the bootstrapplots: 1) I'm using human data and a human CTD but, although I specify in bootstrapplots that sctSpecies_="human", it gives a warning that it's automatically set to mouse: "Warning: sctSpecies not provided. Setting to 'mouse' by default. Warning: sctSpecies_origin not provided. Setting to 'mouse' by default.". Adjusting sctSpecies to sctSpecies_origin doesn't solve the issue. The rest all seems to work so perhaps it's a false warning(?).
2) Regardless of this warning everything works fine for level 1 but not for level 2; for level2 it gives, among other errors, the following error: "ERROR: No cell types in full_results are found in sct_data. Perhaps the wrong annotLevel was used?". This error only occurs when one CTD contains both level 1 and 2, since seperating the CTD (i.e., making a specific CTD for level 1 and a specific CTD for level 2, where in the latter level 2 is indicated as level 1) solves this issue. However, this is quite a cumbersome approach for bigger scripts with multiple input datafiles.
Is there a way to run the bootstrapplots on level 2 when one human CTD contains both levels?
My colleague had exactly the same issue with human data and I therefore assume it pops up with any human dataset combining level1&2 in one CTD. I've attached the console in- and output to illustrate the issue with the same annotation as here (issue 1 and 2).
Thanks again! This helped me a lot already.
I have two other issues with the bootstrapplots:
- I'm using human data and a human CTD but, although I specify in bootstrapplots that sctSpecies_="human", it gives a warning that it's automatically set to mouse: "Warning: sctSpecies not provided. Setting to 'mouse' by default. Warning: sctSpecies_origin not provided. Setting to 'mouse' by default.". Adjusting sctSpecies to sctSpecies_origin doesn't solve the issue. The rest all seems to work so perhaps it's a false warning(?).
- Regardless of this warning everything works fine for level 1 but not for level 2; for level2 it gives, among other errors, the following error: "ERROR: No cell types in full_results are found in sct_data. Perhaps the wrong annotLevel was used?". This error only occurs when one CTD contains both level 1 and 2, since seperating the CTD (i.e., making a specific CTD for level 1 and a specific CTD for level 2, where in the latter level 2 is indicated as level 1) solves this issue. However, this is quite a cumbersome approach for bigger scripts with multiple input datafiles.
Is there a way to run the bootstrapplots on level 2 when one human CTD contains both levels?
My colleague had exactly the same issue with human data and I therefore assume it pops up with any human dataset combining level1&2 in one CTD. I've attached the console in- and output to illustrate the issue with the same annotation as here (issue 1 and 2).
@KevinMarinus I'd recommend submitting this as a separate Issue with a reprex (and filling out the full bug report template) as it's outside the scope of this Issue.
Hi!
I'm using the EWCE tool and I nicely get all the plots for cellular enrichments. However, I'm also interested in which genes drive my significant results (and which genes drive the fold-change for non-signifcant results). Is there an easy way to extract the genes that drive the enrichments?
Thanks! Kevin