NathanSkene / EWCE

Expression Weighted Celltype Enrichment. See the package website for up-to-date instructions on usage.
https://nathanskene.github.io/EWCE/index.html
53 stars 25 forks source link

could not find function "read_celltype_data" #87

Closed ashrma314 closed 8 months ago

ashrma314 commented 9 months ago

Installed EWCE v1.10.0 using Bioconductor version 3.18 (BiocManager 1.30.22), R 4.3.1 (2023-06-16)

Downloaded expression_mRNA_17-Aug-2014.txt to my PC, but every time I try to load the data using the function: >celltype_data = read_celltype_data("expression_mRNA_17-Aug-2014.txt")

I get the error:

Error in read_celltype_data("expression_mRNA_17-Aug-2014.txt") : could not find function "read_celltype_data"

Tried the search function:

getAnywhere("read_celltype_data")

and it returned:

getAnywhere("read_celltype_data") no object named ‘read_celltype_data’ was found

Please help

Al-Murphy commented 9 months ago

Hi,

I'm not sure where yu got this function name from, do you have a link? I thin perhaps you mean the function generate_celltype_data? Have a look at the function documentation for it to see how to use it and also in the vignette.

Thanks, Alan.

NathanSkene commented 9 months ago

I think this is probably caused by use of the old version of EWCE at nathanskene/EWCE instead of the Neurogenomics/EWCE version. I assume you’re using the manual from Nathan skene/EWCE?

Sent from Outlook for iOShttps://aka.ms/o0ukef


From: Alan Murphy @.> Sent: Friday, October 27, 2023 8:29:48 AM To: NathanSkene/EWCE @.> Cc: Subscribed @.***> Subject: Re: [NathanSkene/EWCE] could not find function "read_celltype_data" (Issue #87)

This email from @.*** originates from outside Imperial. Do not click on links and attachments unless you recognise the sender. If you trust the sender, add them to your safe senders listhttps://spam.ic.ac.uk/SpamConsole/Senders.aspx to disable email stamping for this address.

Hi,

I'm not sure where yu got this function name from, do you have a link? I thin perhaps you mean the function generate_celltype_data? Have a look at the function documentation for it to see how to use it and also in the vignettehttps://nathanskene.github.io/EWCE/articles/extended.html#generate-celltypedataset.

Thanks, Alan.

— Reply to this email directly, view it on GitHubhttps://github.com/NathanSkene/EWCE/issues/87#issuecomment-1782435378, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AH5ZPE4YXSH2BVJYUKMKXRTYBNPGZAVCNFSM6AAAAAA6RRE3IWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOBSGQZTKMZXHA. You are receiving this because you are subscribed to this thread.Message ID: @.***>

Al-Murphy commented 9 months ago

Ah yes I see it now. @ashrma314, just to notew the EWCE version at Neurogenomics/EWCE is depreciated and shouldn't be used. Instead please take the version from this github repo or just install with bioconductor.

ashrma314 commented 9 months ago

Thanks so much for you quick response @Al-Murphy and @NathanSkene! I was indeed using the older version of EWC. Downloaded the new version from this repo, and it seems to run seamlessly with example data! I do run into an issue with my original list of DEGs. This might be a silly mistake/oversight that I'm missing (I'm new to coding). Would greatly appreciate if you can share your thoughts. I get the following error:

At least 4 genes which are present in the single cell dataset & background gene set are required to test for enrichment. Only 0 provided.

I read a .csv file contains MGI symbols for mouse genes in the environment > edit the ortholog bit of the code to output mouse IDs. Below is the complete readout from the console.

library(EWCE) library(ggplot2) set.seed(1234)

Load merged cortex and hypothalamus dataset generated by Karolinska institute

ctd <- ewceData::ctd() # i.e. ctd_MergedKI see ?ewceData and browseVignettes('ewceData') for documentation loading from cache try({

  • plt <- EWCE::plot_ctd(ctd = ctd,
  • level = 1,
  • genes = c("Apoe","Gfap","Gapdh"),
  • metric = "mean_exp")
  • }) DEG_APP_MGIs <- read.table("~/Downloads/EWCE-master-2/DEG_APP_MGIs.csv", quote="\"", comment.char="") View(DEG_APP_MGIs) hits <- DEG_APP_MGIs print(hits) exp <- ctd[[1]]$mean_exp

Old conversion method

if running offline pass localhub = TRUE

m2h <- ewceData::mouse_to_human_homologs() see ?ewceData and browseVignettes('ewceData') for documentation loading from cache exp_old <- exp[rownames(exp) %in% m2h$MGI.symbol,]

New conversion method (used by EWCE internally)

exp_new <- orthogene::convert_orthologs(gene_df = exp,

  • input_species = "mouse",
  • output_species = "mouse",
  • method = "homologene") Preparing gene_df. Dense matrix format detected. Extracting genes from rownames. 15,259 genes extracted. Converting mouse ==> mouse orthologs using: homologene Retrieving all organisms available in homologene. Mapping species name: mouse Common name mapping found for mouse 1 organism identified from search: 10090 Retrieving all organisms available in homologene. Mapping species name: mouse Common name mapping found for mouse 1 organism identified from search: 10090 input_species already formatted as output species. Returning input data directly.

    Report

    message("The new method retains ",

  • formatC(nrow(exp_new) - nrow(exp_old), big.mark = ","),
  • " more genes than the old method.") The new method retains 2,934 more genes than the old method.

    Use 100 bootstrap lists for speed, for publishable analysis use >=10000

    reps <- 100

    Use level 1 annotations (i.e. Interneurons)

    annotLevel <- 1

    Bootstrap significance test, no control for transcript length and GC content

    full_results <- EWCE::bootstrap_enrichment_test(sct_data = ctd,

  • sctSpecies = "mouse",
  • genelistSpecies = "mouse",
  • hits = hits,
  • reps = reps,
  • annotLevel = annotLevel) 1 core(s) assigned as workers (7 reserved). Generating gene background for mouse x mouse ==> human Gathering ortholog reports. Retrieving all genes using: homologene. Retrieving all organisms available in homologene. Mapping species name: human Common name mapping found for human 1 organism identified from search: 9606 Gene table with 19,129 rows retrieved. Returning all 19,129 genes from human.

-- mouse Retrieving all genes using: homologene. Retrieving all organisms available in homologene. Mapping species name: mouse Common name mapping found for mouse 1 organism identified from search: 10090 Gene table with 21,207 rows retrieved. Returning all 21,207 genes from mouse.

-- Preparing gene_df. data.frame format detected. Extracting genes from Gene.Symbol. 21,207 genes extracted. Converting mouse ==> human orthologs using: homologene Retrieving all organisms available in homologene. Mapping species name: mouse Common name mapping found for mouse 1 organism identified from search: 10090 Retrieving all organisms available in homologene. Mapping species name: human Common name mapping found for human 1 organism identified from search: 9606 Checking for genes without orthologs in human. Extracting genes from input_gene. 17,355 genes extracted. Extracting genes from ortholog_gene. 17,355 genes extracted. Checking for genes without 1:1 orthologs. Dropping 131 genes that have multiple input_gene per ortholog_gene (many:1). Dropping 498 genes that have multiple ortholog_gene per input_gene (1:many). Filtering gene_df with gene_map Adding input_gene col to gene_df. Adding ortholog_gene col to gene_df.

=========== REPORT SUMMARY ===========

Total genes dropped after convert_orthologs : 4,725 / 21,207 (22%) Total genes remaining after convert_orthologs : 16,482 / 21,207 (78%)

=========== REPORT SUMMARY ===========

16,482 / 21,207 (77.72%) target_species genes remain after ortholog conversion. 16,482 / 19,129 (86.16%) reference_species genes remain after ortholog conversion. 16,482 intersect background genes used. Standardising CellTypeDataset Checking gene list inputs. Converting gene list input to standardised human genes. Error in check_ewce_genelist_inputs(sct_data = sct_data, hits = hits, : At least 4 genes which are present in the single cell dataset & background gene set are required to test for enrichment. Only 0 provided.

NathanSkene commented 9 months ago

Hi, think I might have confused things here.

It's neurogenomics/EWCE that has been depreciated, while nathanskene/EWCE is the up-to-date one.

It looks like you're following the depreciated version of the tutorial. The up to date tutorials are here: https://nathanskene.github.io/EWCE/articles/EWCE https://nathanskene.github.io/EWCE/articles/extended.html

@Al-Murphy can you think of any reason we shouldn't pull the latest version of EWCE onto neurogenomics/EWCE to minimise the confusion of future people? There's probably numerous people doing this, for everyone that writes an issue

Al-Murphy commented 9 months ago

@NathanSkene yeah, the version in neurogenomics is still being used in scflow as the code hasn't been restructured to use the new bioconductor version (at NathanSkene/EWCE). Until scflow is sorted we can't really do anything.

I do have a message on the neurogenomics ewce that prints when you install/library it saying it is deprecated and pointing to the other version.

ashrma314 commented 9 months ago

Thanks @NathanSkene and @Al-Murphy for your quick response again. Just to make sure we're all on the same page, I am using the up to date tutorial that you provided the link to above - It installs EWCE v1.10. which is confusing as the devel version is 1.11.0, perhaps that's why I can't run it? Nonetheless, the issue remains, I get the same error:

Error in check_ewce_genelist_inputs(sct_data = sct_data, hits = hits, : At least 4 genes which are present in the single cell dataset & background gene set are required to test for enrichment. Only 0 provided.

I get the same error even if I read a csv file with all the genes from the example_gentlist(). Perhaps it's a formatting issue? what would be the accurate format of reading an original list of DEG? I read a csv file with MGI gene names in the rows and no title for the column.

NathanSkene commented 9 months ago

Can you share with us the commands that you have run so far in the session, and what the output was

From: ashrma314 @.> Date: Monday, 30 October 2023 at 17:24 To: NathanSkene/EWCE @.> Cc: Skene, Nathan G @.>, Mention @.> Subject: Re: [NathanSkene/EWCE] could not find function "read_celltype_data" (Issue #87) This email from @.*** originates from outside Imperial. Do not click on links and attachments unless you recognise the sender. If you trust the sender, add them to your safe senders listhttps://spam.ic.ac.uk/SpamConsole/Senders.aspx to disable email stamping for this address.

Thanks @NathanSkenehttps://github.com/NathanSkene and @Al-Murphyhttps://github.com/Al-Murphy for your quick response again. Just to make sure we're all on the same page, I am using the up to date tutorial that you provided the link to above - It installs EWCE v1.10. which is confusing as the devel version is 1.11.0, perhaps that's why I can't run it? Nonetheless, the issue remains, I get the same error:

Error in check_ewce_genelist_inputs(sct_data = sct_data, hits = hits, : At least 4 genes which are present in the single cell dataset & background gene set are required to test for enrichment. Only 0 provided.

I get the same error even if I read a csv file with all the genes from the example_gentlist(). Perhaps it's a formatting issue? what would be the accurate format of reading an original list of DEG? I read a csv file with MGI gene names in the rows and no title for the column.

— Reply to this email directly, view it on GitHubhttps://github.com/NathanSkene/EWCE/issues/87#issuecomment-1785709342, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AH5ZPE3AMYCAU6HLSLVL32DYB7PENAVCNFSM6AAAAAA6RRE3IWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOBVG4YDSMZUGI. You are receiving this because you were mentioned.Message ID: @.***>

ashrma314 commented 9 months ago

@NathanSkene

The following are my input commands: (Also attached as a .txt file) EWCE_AS.txt

if (!require("BiocManager")){install.packages("BiocManager")} BiocManager::install("EWCE")

library(EWCE)

set.seed(1234)

Package name

pkg <- tolower("EWCE")

Username of DockerHub account

docker_user <- "neurogenomicslab"

ctd <- ewceData::ctd()

plt_exp <- EWCE::plot_ctd(ctd = ctd, level = 1, genes = c("Apoe","Gfap","Gapdh"), metric = "mean_exp")

plt_spec <- EWCE::plot_ctd(ctd = ctd, level = 2, genes = c("Apoe","Gfap","Gapdh"), metric = "specificity")

Original code loads the list from ewceData using the following: hits <- ewceData::example_genelist()

I read my data into the environment using From Text (Base) option from imoprt dataset dropdown in the 'Environment'

hits <- DEG_APP_MGIs head(hits, 5)

or print(hits) shows full list

reps <- 100 annotLevel <- 1

original code uses genelistspecies = human, but I have a list of MGI gene symbols, so I use genelistspecies = mouse

full_results <- EWCE::bootstrap_enrichment_test(sct_data = ctd, sctSpecies = "mouse", genelistSpecies = "mouse", hits = hits, reps = reps, annotLevel = annotLevel)

knitr::kable(full_results$results)

plot_list <- EWCE::ewce_plot(total_res = full_results$results, mtc_method = "BH", ctd = ctd) print(plot_list$withDendro)

####################################################################################### Following is the output in my console:

library(EWCE) set.seed(1234)

Package name

pkg <- tolower("EWCE")

Username of DockerHub account

docker_user <- "neurogenomicslab" ctd <- ewceData::ctd() see ?ewceData and browseVignettes('ewceData') for documentation loading from cache plt_exp <- EWCE::plot_ctd(ctd = ctd,

  • level = 1,
  • genes = c("Apoe","Gfap","Gapdh"),
  • metric = "mean_exp") plt_spec <- EWCE::plot_ctd(ctd = ctd,
  • level = 2,
  • genes = c("Apoe","Gfap","Gapdh"),
  • metric = "specificity") hits <- DEG_APP_MGIs head(hits, 5) example_genelist 1 Oas1c 2 Hmgcr 3 Hmgcs2 4 Hacd3 5 Haus8 reps <- 100 annotLevel <- 1 full_results <- EWCE::bootstrap_enrichment_test(sct_data = ctd,
  • sctSpecies = "mouse",
  • genelistSpecies = "mouse",
  • hits = hits,
  • reps = reps,
  • annotLevel = annotLevel) 1 core(s) assigned as workers (7 reserved). Generating gene background for mouse x mouse ==> human Gathering ortholog reports. Retrieving all genes using: homologene. Retrieving all organisms available in homologene. Mapping species name: human Common name mapping found for human 1 organism identified from search: 9606 Gene table with 19,129 rows retrieved. Returning all 19,129 genes from human.

-- mouse Retrieving all genes using: homologene. Retrieving all organisms available in homologene. Mapping species name: mouse Common name mapping found for mouse 1 organism identified from search: 10090 Gene table with 21,207 rows retrieved. Returning all 21,207 genes from mouse.

-- Preparing gene_df. data.frame format detected. Extracting genes from Gene.Symbol. 21,207 genes extracted. Converting mouse ==> human orthologs using: homologene Retrieving all organisms available in homologene. Mapping species name: mouse Common name mapping found for mouse 1 organism identified from search: 10090 Retrieving all organisms available in homologene. Mapping species name: human Common name mapping found for human 1 organism identified from search: 9606 Checking for genes without orthologs in human. Extracting genes from input_gene. 17,355 genes extracted. Extracting genes from ortholog_gene. 17,355 genes extracted. Checking for genes without 1:1 orthologs. Dropping 131 genes that have multiple input_gene per ortholog_gene (many:1). Dropping 498 genes that have multiple ortholog_gene per input_gene (1:many). Filtering gene_df with gene_map Adding input_gene col to gene_df. Adding ortholog_gene col to gene_df.

=========== REPORT SUMMARY ===========

Total genes dropped after convert_orthologs : 4,725 / 21,207 (22%) Total genes remaining after convert_orthologs : 16,482 / 21,207 (78%)

=========== REPORT SUMMARY ===========

16,482 / 21,207 (77.72%) target_species genes remain after ortholog conversion. 16,482 / 19,129 (86.16%) reference_species genes remain after ortholog conversion. 16,482 intersect background genes used. Standardising CellTypeDataset Checking gene list inputs. Converting gene list input to standardised human genes. Error in check_ewce_genelist_inputs(sct_data = sct_data, hits = hits, : At least 4 genes which are present in the single cell dataset & background gene set are required to test for enrichment. Only 0 provided.

ashrma314 commented 9 months ago

I am reading a .csv file in which each row in an MGI gene symbol for DEG; Total: 1096

Al-Murphy commented 8 months ago

I guess the issue is the list of genes you are trying to test (hits) since this is the only thing you have changed from the documentation. Can you send the csv?:

DEG_APP_MGIs <- read.table("~/Downloads/EWCE-master-2/DEG_APP_MGIs.csv", quote=""", comment.char="")
View(DEG_APP_MGIs)
hits <- DEG_APP_MGIs

They may not be appropriately formatted so 0 remain after the validation set causing the error:

At least 4 genes which are present in the single cell dataset & background gene set are required to test for enrichment. Only 0 provided.
ashrma314 commented 8 months ago

That might be. I was wondering the same thing as I've tried to format is different ways. I've attached the .csv file here.

Thanks in advance!!

DEG_APP_MGIs.csv

Al-Murphy commented 8 months ago

The issue is you aren't passing a vector of hits to EWCE which causes it to fail. Below will work:

DEG_APP_MGIs <- read.table("~/Downloads/DEG_APP_MGIs.csv")
hits <- DEG_APP_MGIs$V1
ctd <- ewceData::ctd()
reps <- 100
annotLevel <- 1

full_results <- EWCE::bootstrap_enrichment_test(sct_data = ctd,
                                                sctSpecies = "mouse",
                                                genelistSpecies = "mouse",
                                                hits = hits,
                                                reps = reps,
                                                annotLevel = annotLevel)
ashrma314 commented 8 months ago

It worked!! It will make a great addition to our paper! Thanks so much!