How to deal with NA data for non model species?

cagenet commented 3 years ago

Hi Federico, Thank you for this really nice and helpful package. It is really useful. I used it for agronomical species (pig and bovine). I encountered a problem when trying to use enhance_table function. I have the following error :

p <- enhance_table(res_enrich_pig,

res,

n_gs = 30,

gtl<-gtl_pig,

annotation_obj = resannot_Pig,

chars_limit = 60) Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : les noms de lignes contiennent des valeurs manquantes I guess the trouble came from the fact that I use ENSEMBLID, convert it in HGNC symbol but as not all genes are identified , I have some missing data (valeurs manquantes in french). Could it be possible to allow missing values in this function ? Thank you in advance Regards Carine

federicomarini commented 3 years ago

Hi Carine! Quick note on the way you call the function - it should be enough to use the gtl object, as the res/res_enrich/annotation are all picked from that. Re: the missing values problem - does the error persist if you

keep the ids as ensembl in the result
have the conversion handled internally with the annotation object?

cagenet commented 3 years ago

Hi Federico, Thank you so much for your quick reply. First, I didn't catch what you mean by "the way I call the function ". I follow your user guide (step 5.1). For the missing values problem : I think that there is two issues : the first one is that for the pig specie there is no mapping between Ensembl and symbol in org.Ss.eg.db (see https://support.bioconductor.org/p/9137574/)... So I used bovine species (I have 2 datasets I applied a treatment in bovine and in pig and want to compare the results obtained in the two species). Now with the bovine species, the enhance_table work with symbol but not with ENSEMBL ids. I do have a conversion handled internally in the annotation object (a two table file with gene_id and gene_symbol (call HGNC in my code) and I have in the RowData in dds object an annotation with Ensemblid and gene symbol (call HGNC in my code see below). What I don't understand is that I have an conversion table in the annotation object and also in the RowData (dds).

Thank you for your help. Cheers

Here is my code :

library(GeneTonic) load(file="DESEQ2_dds1_Bov.rda")#dds2

dds2 class: DESeqDataSet dim: 15779 20 metadata(1): version assays(4): counts mu H cooks rownames(15779): ENSBTAG00000000005 ENSBTAG00000000009 ... ENSBTAG00000055312 ENSBTAG00000055315 rowData names(60): ENSEMBLID HGNC ... deviance maxCooks colnames(20): BE1_CTRL BE2_CTRL ... BE14_BMP15 BE16_BMP15 colData names(3): Cow Treatment sizeFactor

load(file="resBta_04juin.rda")

summary(resBta) out of 15779 with nonzero total read count adjusted p-value < 0.05 LFC > 0 (up) : 2323, 15% LFC < 0 (down) : 2460, 16% outliers [1] : 0, 0% low counts [2] : 306, 1.9% (mean count < 2) [1] see 'cooksCutoff' argument of ?results [2] see 'independentFiltering' argument of ?results

resannot_Bov<-read.table(file="annot_bov.txt", header = TRUE, sep="\t")

head(resannot_Bov) gene_id gene_name 1 ENSBTAG00000000005 GRK3 2 ENSBTAG00000000008 KCNJ1 3 ENSBTAG00000000009 FOXF1 4 ENSBTAG00000000010 UBL7 5 ENSBTAG00000000011 TDH 6 ENSBTAG00000000012 TTC33

de_symbol_bov <- deseqresult2df(resBta, FDR = 0.05)$SYMBOL bg_bov_sym <- rowData(dds2)$HGNC[rowSums(counts(dds2)) > 0]

resBta$ensemblid<-(row.names(resBta)) bg_ensembl_bov <- rowData(dds2)$ENSEMBLID[rowSums(counts(dds2)) > 0] de_ensembl_bov <- deseqresult2df(resBta, FDR = 0.05)$ensemblid library("topGO") topgoBP_bov <- pcaExplorer::topGOtable(de_symbol_bov, bg_bov_sym, ontology = "BP", mapping = "org.Bt.eg.db", geneID = "symbol", topTablerows = 500)

topgoBPensembl_bov <- pcaExplorer::topGOtable(de_ensembl_bov, bg_ensembl_bov, ontology = "BP", mapping = "org.Bt.eg.db", geneID = "ENSEMBL", topTablerows = 500) res_enrich_bov<-shake_topGOtableResult(topgoBP_bov) res_enrich_bov_ensembl<-shake_topGOtableResult(topgoBPensembl_bov) res_enrich_bov <- get_aggrscores(res_enrich = res_enrich_bov, res_de = resBta, annotation_obj = resannot_Bov, aggrfun = mean) res_enrich_bov_ensembl<-get_aggrscores(res_enrich = res_enrich_bov_ensembl, res_de = resBta,annotation_obj = resannot_Bov,aggrfun = mean) gtl_bov <- GeneTonic_list( dds = dds2, res_de = resBta, res_enrich = res_enrich_bov, annotation_obj = resannot_Bov )

gtl_bov_ensembl<-GeneTonic_list(dds=dds2, res_de = resBta, res_enrich = res_enrich_bov_ensembl, annotation_obj = resannot_Bov) p <- enhance_table(res_enrich_bov, resBta, n_gs = 30, gtl<-gtl_bov, annotation_obj = resannot_Bov, chars_limit = 60) p1<-enhance_table(res_enrich_bov_ensembl, resBta, n_gs=30, gtl = gtl_bov_ensembl, annotation_obj = resannot_Bov, chars_limit = 60) Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : les noms de lignes contiennent des valeurs manquantes

federicomarini commented 3 years ago

Hi Federico, Thank you so much for your quick reply. First, I didn't catch what you mean by "the way I call the function ". I follow your user guide (step 5.1).

Hi Carine, in theory you can call

enhance_table(gtl = gtl_bov_ensembl, n_gs=30, chars_limit = 60)

as gtl should have all it needs 😉

For the missing values problem : I think that there is two issues : the first one is that for the pig specie there is no mapping between Ensembl and symbol in org.Ss.eg.db (see https://support.bioconductor.org/p/9137574/)... So I used bovine species (I have 2 datasets I applied a treatment in bovine and in pig and want to compare the results obtained in the two species). Now with the bovine species, the enhance_table work with symbol but not with ENSEMBL ids. I do have a conversion handled internally in the annotation object (a two table file with gene_id and gene_symbol (call HGNC in my code) and I have in the RowData in dds object an annotation with Ensemblid and gene symbol (call HGNC in my code see below). What I don't understand is that I have an conversion table in the annotation object and also in the RowData (dds).

Thank you for your help. Cheers

Here is my code :

library(GeneTonic) load(file="DESEQ2_dds1_Bov.rda")#dds2

dds2 class: DESeqDataSet dim: 15779 20 metadata(1): version assays(4): counts mu H cooks rownames(15779): ENSBTAG00000000005 ENSBTAG00000000009 ... ENSBTAG00000055312 ENSBTAG00000055315 rowData names(60): ENSEMBLID HGNC ... deviance maxCooks colnames(20): BE1_CTRL BE2_CTRL ... BE14_BMP15 BE16_BMP15 colData names(3): Cow Treatment sizeFactor

load(file="resBta_04juin.rda")

summary(resBta) out of 15779 with nonzero total read count adjusted p-value < 0.05 LFC > 0 (up) : 2323, 15% LFC < 0 (down) : 2460, 16% outliers [1] : 0, 0% low counts [2] : 306, 1.9% (mean count < 2) [1] see 'cooksCutoff' argument of ?results [2] see 'independentFiltering' argument of ?results

resannot_Bov<-read.table(file="annot_bov.txt", header = TRUE, sep="\t")

head(resannot_Bov) gene_id gene_name 1 ENSBTAG00000000005 GRK3 2 ENSBTAG00000000008 KCNJ1 3 ENSBTAG00000000009 FOXF1 4 ENSBTAG00000000010 UBL7 5 ENSBTAG00000000011 TDH 6 ENSBTAG00000000012 TTC33

de_symbol_bov <- deseqresult2df(resBta, FDR = 0.05)$SYMBOL bg_bov_sym <- rowData(dds2)$HGNC[rowSums(counts(dds2)) > 0]

resBta$ensemblid<-(row.names(resBta)) bg_ensembl_bov <- rowData(dds2)$ENSEMBLID[rowSums(counts(dds2)) > 0] de_ensembl_bov <- deseqresult2df(resBta, FDR = 0.05)$ensemblid library("topGO") topgoBP_bov <- pcaExplorer::topGOtable(de_symbol_bov, bg_bov_sym, ontology = "BP", mapping = "org.Bt.eg.db", geneID = "symbol", topTablerows = 500)

Ok, now that you elaborate a little more on this (and since there's no ensembl->symbol conversion) it can be that the gs_genes column in the res_enrich...

topgoBPensembl_bov <- pcaExplorer::topGOtable(de_ensembl_bov, bg_ensembl_bov, ontology = "BP", mapping = "org.Bt.eg.db", geneID = "ENSEMBL", topTablerows = 500) res_enrich_bov<-shake_topGOtableResult(topgoBP_bov) res_enrich_bov_ensembl<-shake_topGOtableResult(topgoBPensembl_bov)

... here ☝️ does not have the gene symbols "as expected".

I am not so familiar with species other than human, mouse & fly, as these are the almost entirety of the catalogue our cooperation partners work with.

If you want to share with me the gtl object, I can have a look (will of course be handled confidentially).

Federico

res_enrich_bov <- get_aggrscores(res_enrich = res_enrich_bov, res_de = resBta, annotation_obj = resannot_Bov, aggrfun = mean) res_enrich_bov_ensembl<-get_aggrscores(res_enrich = res_enrich_bov_ensembl, res_de = resBta,annotation_obj = resannot_Bov,aggrfun = mean) gtl_bov <- GeneTonic_list( dds = dds2, res_de = resBta, res_enrich = res_enrich_bov, annotation_obj = resannot_Bov )

gtl_bov_ensembl<-GeneTonic_list(dds=dds2, res_de = resBta, res_enrich = res_enrich_bov_ensembl, annotation_obj = resannot_Bov) p <- enhance_table(res_enrich_bov, resBta, n_gs = 30, gtl<-gtl_bov, annotation_obj = resannot_Bov, chars_limit = 60) p1<-enhance_table(res_enrich_bov_ensembl, resBta, n_gs=30, gtl = gtl_bov_ensembl, annotation_obj = resannot_Bov, chars_limit = 60) Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : les noms de lignes contiennent des valeurs manquantes

cagenet commented 3 years ago

Hi ;-) First issue: I guess that it was something like that BUT I still obtained the same error ... p<-enhance_table(gtl=gtl_bov_ensembl, n_gs = 30, chars_limit = 60) Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : les noms de lignes contiennent des valeurs manquantes

Second issue : Thank you so much for your help ... Of course I can send you my gtl object ... Will try to import and give you access to gtf object in my repository (GitHub newbie and never used it !) Thank you

federicomarini commented 3 years ago

Sure, no prob. You can reach me at the email address in the maintainer field of the package (trying to make bots' work to scrape my email a little harder, sorry for the extra step 😛 )

federicomarini / GeneTonic

How to deal with NA data for non model species? #21