Closed ashrma314 closed 8 months ago
Hi,
I'm not sure where yu got this function name from, do you have a link? I thin perhaps you mean the function generate_celltype_data
? Have a look at the function documentation for it to see how to use it and also in the vignette.
Thanks, Alan.
I think this is probably caused by use of the old version of EWCE at nathanskene/EWCE instead of the Neurogenomics/EWCE version. I assume you’re using the manual from Nathan skene/EWCE?
Sent from Outlook for iOShttps://aka.ms/o0ukef
From: Alan Murphy @.> Sent: Friday, October 27, 2023 8:29:48 AM To: NathanSkene/EWCE @.> Cc: Subscribed @.***> Subject: Re: [NathanSkene/EWCE] could not find function "read_celltype_data" (Issue #87)
This email from @.*** originates from outside Imperial. Do not click on links and attachments unless you recognise the sender. If you trust the sender, add them to your safe senders listhttps://spam.ic.ac.uk/SpamConsole/Senders.aspx to disable email stamping for this address.
Hi,
I'm not sure where yu got this function name from, do you have a link? I thin perhaps you mean the function generate_celltype_data? Have a look at the function documentation for it to see how to use it and also in the vignettehttps://nathanskene.github.io/EWCE/articles/extended.html#generate-celltypedataset.
Thanks, Alan.
— Reply to this email directly, view it on GitHubhttps://github.com/NathanSkene/EWCE/issues/87#issuecomment-1782435378, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AH5ZPE4YXSH2BVJYUKMKXRTYBNPGZAVCNFSM6AAAAAA6RRE3IWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOBSGQZTKMZXHA. You are receiving this because you are subscribed to this thread.Message ID: @.***>
Ah yes I see it now. @ashrma314, just to notew the EWCE version at Neurogenomics/EWCE is depreciated and shouldn't be used. Instead please take the version from this github repo or just install with bioconductor.
Thanks so much for you quick response @Al-Murphy and @NathanSkene! I was indeed using the older version of EWC. Downloaded the new version from this repo, and it seems to run seamlessly with example data! I do run into an issue with my original list of DEGs. This might be a silly mistake/oversight that I'm missing (I'm new to coding). Would greatly appreciate if you can share your thoughts. I get the following error:
At least 4 genes which are present in the single cell dataset & background gene set are required to test for enrichment. Only 0 provided.
I read a .csv file contains MGI symbols for mouse genes in the environment > edit the ortholog bit of the code to output mouse IDs. Below is the complete readout from the console.
library(EWCE) library(ggplot2) set.seed(1234)
Load merged cortex and hypothalamus dataset generated by Karolinska institute
ctd <- ewceData::ctd() # i.e. ctd_MergedKI see ?ewceData and browseVignettes('ewceData') for documentation loading from cache try({
- plt <- EWCE::plot_ctd(ctd = ctd,
- level = 1,
- genes = c("Apoe","Gfap","Gapdh"),
- metric = "mean_exp")
- }) DEG_APP_MGIs <- read.table("~/Downloads/EWCE-master-2/DEG_APP_MGIs.csv", quote="\"", comment.char="") View(DEG_APP_MGIs) hits <- DEG_APP_MGIs print(hits) exp <- ctd[[1]]$mean_exp
Old conversion method
if running offline pass localhub = TRUE
m2h <- ewceData::mouse_to_human_homologs() see ?ewceData and browseVignettes('ewceData') for documentation loading from cache exp_old <- exp[rownames(exp) %in% m2h$MGI.symbol,]
New conversion method (used by EWCE internally)
exp_new <- orthogene::convert_orthologs(gene_df = exp,
- input_species = "mouse",
- output_species = "mouse",
- method = "homologene") Preparing gene_df. Dense matrix format detected. Extracting genes from rownames. 15,259 genes extracted. Converting mouse ==> mouse orthologs using: homologene Retrieving all organisms available in homologene. Mapping species name: mouse Common name mapping found for mouse 1 organism identified from search: 10090 Retrieving all organisms available in homologene. Mapping species name: mouse Common name mapping found for mouse 1 organism identified from search: 10090 input_species already formatted as output species. Returning input data directly.
Report
message("The new method retains ",
- formatC(nrow(exp_new) - nrow(exp_old), big.mark = ","),
- " more genes than the old method.") The new method retains 2,934 more genes than the old method.
Use 100 bootstrap lists for speed, for publishable analysis use >=10000
reps <- 100
Use level 1 annotations (i.e. Interneurons)
annotLevel <- 1
Bootstrap significance test, no control for transcript length and GC content
full_results <- EWCE::bootstrap_enrichment_test(sct_data = ctd,
- sctSpecies = "mouse",
- genelistSpecies = "mouse",
- hits = hits,
- reps = reps,
- annotLevel = annotLevel) 1 core(s) assigned as workers (7 reserved). Generating gene background for mouse x mouse ==> human Gathering ortholog reports. Retrieving all genes using: homologene. Retrieving all organisms available in homologene. Mapping species name: human Common name mapping found for human 1 organism identified from search: 9606 Gene table with 19,129 rows retrieved. Returning all 19,129 genes from human.
-- Preparing gene_df. data.frame format detected. Extracting genes from Gene.Symbol. 21,207 genes extracted. Converting mouse ==> human orthologs using: homologene Retrieving all organisms available in homologene. Mapping species name: mouse Common name mapping found for mouse 1 organism identified from search: 10090 Retrieving all organisms available in homologene. Mapping species name: human Common name mapping found for human 1 organism identified from search: 9606 Checking for genes without orthologs in human. Extracting genes from input_gene. 17,355 genes extracted. Extracting genes from ortholog_gene. 17,355 genes extracted. Checking for genes without 1:1 orthologs. Dropping 131 genes that have multiple input_gene per ortholog_gene (many:1). Dropping 498 genes that have multiple ortholog_gene per input_gene (1:many). Filtering gene_df with gene_map Adding input_gene col to gene_df. Adding ortholog_gene col to gene_df.
=========== REPORT SUMMARY ===========
=========== REPORT SUMMARY ===========
16,482 / 21,207 (77.72%) target_species genes remain after ortholog conversion. 16,482 / 19,129 (86.16%) reference_species genes remain after ortholog conversion. 16,482 intersect background genes used. Standardising CellTypeDataset Checking gene list inputs. Converting gene list input to standardised human genes. Error in check_ewce_genelist_inputs(sct_data = sct_data, hits = hits, : At least 4 genes which are present in the single cell dataset & background gene set are required to test for enrichment. Only 0 provided.
Hi, think I might have confused things here.
It's neurogenomics/EWCE that has been depreciated, while nathanskene/EWCE is the up-to-date one.
It looks like you're following the depreciated version of the tutorial. The up to date tutorials are here: https://nathanskene.github.io/EWCE/articles/EWCE https://nathanskene.github.io/EWCE/articles/extended.html
@Al-Murphy can you think of any reason we shouldn't pull the latest version of EWCE onto neurogenomics/EWCE to minimise the confusion of future people? There's probably numerous people doing this, for everyone that writes an issue
@NathanSkene yeah, the version in neurogenomics is still being used in scflow as the code hasn't been restructured to use the new bioconductor version (at NathanSkene/EWCE). Until scflow is sorted we can't really do anything.
I do have a message on the neurogenomics ewce that prints when you install/library it saying it is deprecated and pointing to the other version.
Thanks @NathanSkene and @Al-Murphy for your quick response again. Just to make sure we're all on the same page, I am using the up to date tutorial that you provided the link to above - It installs EWCE v1.10. which is confusing as the devel version is 1.11.0, perhaps that's why I can't run it? Nonetheless, the issue remains, I get the same error:
Error in check_ewce_genelist_inputs(sct_data = sct_data, hits = hits, : At least 4 genes which are present in the single cell dataset & background gene set are required to test for enrichment. Only 0 provided.
I get the same error even if I read a csv file with all the genes from the example_gentlist(). Perhaps it's a formatting issue? what would be the accurate format of reading an original list of DEG? I read a csv file with MGI gene names in the rows and no title for the column.
Can you share with us the commands that you have run so far in the session, and what the output was
From: ashrma314 @.> Date: Monday, 30 October 2023 at 17:24 To: NathanSkene/EWCE @.> Cc: Skene, Nathan G @.>, Mention @.> Subject: Re: [NathanSkene/EWCE] could not find function "read_celltype_data" (Issue #87) This email from @.*** originates from outside Imperial. Do not click on links and attachments unless you recognise the sender. If you trust the sender, add them to your safe senders listhttps://spam.ic.ac.uk/SpamConsole/Senders.aspx to disable email stamping for this address.
Thanks @NathanSkenehttps://github.com/NathanSkene and @Al-Murphyhttps://github.com/Al-Murphy for your quick response again. Just to make sure we're all on the same page, I am using the up to date tutorial that you provided the link to above - It installs EWCE v1.10. which is confusing as the devel version is 1.11.0, perhaps that's why I can't run it? Nonetheless, the issue remains, I get the same error:
Error in check_ewce_genelist_inputs(sct_data = sct_data, hits = hits, : At least 4 genes which are present in the single cell dataset & background gene set are required to test for enrichment. Only 0 provided.
I get the same error even if I read a csv file with all the genes from the example_gentlist(). Perhaps it's a formatting issue? what would be the accurate format of reading an original list of DEG? I read a csv file with MGI gene names in the rows and no title for the column.
— Reply to this email directly, view it on GitHubhttps://github.com/NathanSkene/EWCE/issues/87#issuecomment-1785709342, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AH5ZPE3AMYCAU6HLSLVL32DYB7PENAVCNFSM6AAAAAA6RRE3IWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOBVG4YDSMZUGI. You are receiving this because you were mentioned.Message ID: @.***>
@NathanSkene
The following are my input commands: (Also attached as a .txt file) EWCE_AS.txt
if (!require("BiocManager")){install.packages("BiocManager")} BiocManager::install("EWCE")
library(EWCE)
set.seed(1234)
pkg <- tolower("EWCE")
docker_user <- "neurogenomicslab"
ctd <- ewceData::ctd()
plt_exp <- EWCE::plot_ctd(ctd = ctd, level = 1, genes = c("Apoe","Gfap","Gapdh"), metric = "mean_exp")
plt_spec <- EWCE::plot_ctd(ctd = ctd, level = 2, genes = c("Apoe","Gfap","Gapdh"), metric = "specificity")
hits <- DEG_APP_MGIs head(hits, 5)
reps <- 100 annotLevel <- 1
full_results <- EWCE::bootstrap_enrichment_test(sct_data = ctd, sctSpecies = "mouse", genelistSpecies = "mouse", hits = hits, reps = reps, annotLevel = annotLevel)
knitr::kable(full_results$results)
plot_list <- EWCE::ewce_plot(total_res = full_results$results, mtc_method = "BH", ctd = ctd) print(plot_list$withDendro)
####################################################################################### Following is the output in my console:
library(EWCE) set.seed(1234)
Package name
pkg <- tolower("EWCE")
Username of DockerHub account
docker_user <- "neurogenomicslab" ctd <- ewceData::ctd() see ?ewceData and browseVignettes('ewceData') for documentation loading from cache plt_exp <- EWCE::plot_ctd(ctd = ctd,
- level = 1,
- genes = c("Apoe","Gfap","Gapdh"),
- metric = "mean_exp") plt_spec <- EWCE::plot_ctd(ctd = ctd,
- level = 2,
- genes = c("Apoe","Gfap","Gapdh"),
- metric = "specificity") hits <- DEG_APP_MGIs head(hits, 5) example_genelist 1 Oas1c 2 Hmgcr 3 Hmgcs2 4 Hacd3 5 Haus8 reps <- 100 annotLevel <- 1 full_results <- EWCE::bootstrap_enrichment_test(sct_data = ctd,
- sctSpecies = "mouse",
- genelistSpecies = "mouse",
- hits = hits,
- reps = reps,
- annotLevel = annotLevel) 1 core(s) assigned as workers (7 reserved). Generating gene background for mouse x mouse ==> human Gathering ortholog reports. Retrieving all genes using: homologene. Retrieving all organisms available in homologene. Mapping species name: human Common name mapping found for human 1 organism identified from search: 9606 Gene table with 19,129 rows retrieved. Returning all 19,129 genes from human.
-- Preparing gene_df. data.frame format detected. Extracting genes from Gene.Symbol. 21,207 genes extracted. Converting mouse ==> human orthologs using: homologene Retrieving all organisms available in homologene. Mapping species name: mouse Common name mapping found for mouse 1 organism identified from search: 10090 Retrieving all organisms available in homologene. Mapping species name: human Common name mapping found for human 1 organism identified from search: 9606 Checking for genes without orthologs in human. Extracting genes from input_gene. 17,355 genes extracted. Extracting genes from ortholog_gene. 17,355 genes extracted. Checking for genes without 1:1 orthologs. Dropping 131 genes that have multiple input_gene per ortholog_gene (many:1). Dropping 498 genes that have multiple ortholog_gene per input_gene (1:many). Filtering gene_df with gene_map Adding input_gene col to gene_df. Adding ortholog_gene col to gene_df.
=========== REPORT SUMMARY ===========
=========== REPORT SUMMARY ===========
16,482 / 21,207 (77.72%) target_species genes remain after ortholog conversion. 16,482 / 19,129 (86.16%) reference_species genes remain after ortholog conversion. 16,482 intersect background genes used. Standardising CellTypeDataset Checking gene list inputs. Converting gene list input to standardised human genes. Error in check_ewce_genelist_inputs(sct_data = sct_data, hits = hits, : At least 4 genes which are present in the single cell dataset & background gene set are required to test for enrichment. Only 0 provided.
I am reading a .csv file in which each row in an MGI gene symbol for DEG; Total: 1096
I guess the issue is the list of genes you are trying to test (hits) since this is the only thing you have changed from the documentation. Can you send the csv?:
DEG_APP_MGIs <- read.table("~/Downloads/EWCE-master-2/DEG_APP_MGIs.csv", quote=""", comment.char="")
View(DEG_APP_MGIs)
hits <- DEG_APP_MGIs
They may not be appropriately formatted so 0 remain after the validation set causing the error:
At least 4 genes which are present in the single cell dataset & background gene set are required to test for enrichment. Only 0 provided.
That might be. I was wondering the same thing as I've tried to format is different ways. I've attached the .csv file here.
Thanks in advance!!
The issue is you aren't passing a vector of hits to EWCE which causes it to fail. Below will work:
DEG_APP_MGIs <- read.table("~/Downloads/DEG_APP_MGIs.csv")
hits <- DEG_APP_MGIs$V1
ctd <- ewceData::ctd()
reps <- 100
annotLevel <- 1
full_results <- EWCE::bootstrap_enrichment_test(sct_data = ctd,
sctSpecies = "mouse",
genelistSpecies = "mouse",
hits = hits,
reps = reps,
annotLevel = annotLevel)
It worked!! It will make a great addition to our paper! Thanks so much!
Installed EWCE v1.10.0 using Bioconductor version 3.18 (BiocManager 1.30.22), R 4.3.1 (2023-06-16)
Downloaded expression_mRNA_17-Aug-2014.txt to my PC, but every time I try to load the data using the function: >celltype_data = read_celltype_data("expression_mRNA_17-Aug-2014.txt")
I get the error:
Tried the search function:
and it returned:
Please help