Annotation not available

mgalland commented 5 years ago

Dear federico, The ideal application is really awesome. I am using it for an upcoming short practical course on rna-seq.

I'd like to have the annotation available for Arabidopsis. Yet, it is not pre-installed. I understand that this can be solved locally but could you have a look for the public instance of ideal?

This is my error message: The package org.At.tair.db is not installed/available. Try installing it with BiocManager::install('org.At.tair.db').

In advance thank you Marc

federicomarini commented 5 years ago

Hi Marc, thank you for the feedback! I assume you are using the public instance of iSEE at the IMBEI? If so, this issue is solved - I just installed the missing package (in my university medical center we do not have people working with plant as model system). Federico

mgalland commented 5 years ago

Waouw that's a quick answer 👍 thanks for that.

Yes, I am using the public instance at the IMBEI yes. Thanks for having added the Arabidopsis annotation. I still get problems with my counts.final.tsv. While I thought the gene id was an ENSEMBL one, the "Create the annotation data frame for your dataset" step keeps on failing for some (unknown to me) reason. Any clue?

The count file is here.

The design file is here.

Thanks Marc

federicomarini commented 5 years ago

Hm, not being familiar with Arabidopsis and all its related wealth of annotation & co I do not have an answer right away. Still, some things to check:

the gene ids should be as row.names (i.e. you should ideally remove the first name in the header)
check locally what keytypes() returns when using org.At.tair.db: I am not seeing ENSEMBL among the possible identifier types - probably it is a different name what is used?
if that should be the case: as of now ideal is expecting some fixed id types - I thought these would cover all the use cases but it probably is not the way it is for At

I can think of a way to pick dynamically these, but as a workaround you could construct the annotation object outside of ideal, e.g. with a call to mapIds(), on the line of

mytable <- read.delim("https://raw.githubusercontent.com/ScienceParkStudyGroup/2019-03-07-rnaseq-workshop/gh-pages/files/counts.final.tsv",sep="\t")
library(org.At.tair.db)
mapIds(org.At.tair.db,column ="SYMBOL",keytype="TAIR",keys = as.character(mytable$Geneid))
# and then combine them into a data.frame with the required column names

Hope that helps for the time being, I'll try to solve the organism-dependency soon. Federico

federicomarini commented 5 years ago

A somewhat more comfortable solution, starting from the scratch in your example:

mytable <- read.delim("https://raw.githubusercontent.com/ScienceParkStudyGroup/2019-03-07-rnaseq-workshop/gh-pages/files/counts.final.tsv",sep="\t")
mydesign <- read.delim("https://raw.githubusercontent.com/ScienceParkStudyGroup/2019-03-07-rnaseq-workshop/gh-pages/files/design.tsv",sep="\t")
mydesign
rownames(mydesign) <- mydesign$SampleName
rownames(mytable) <- mytable$Geneid
head(mytable)
mytable <- mytable[,-1]
head(mytable)
dds <- DESeq2::DESeqDataSetFromMatrix(mytable,mydesign,~1)
dds
library(org.At.tair.db)
library(pcaExplorer)
library(ideal)
?get_annotation_orgdb
rownames(dds)
myanno <- pcaExplorer::get_annotation_orgdb(dds, "org.At.tair.db","TAIR")
ideal(dds,annotation_obj = myanno)

This one leverages the wrapper get_annotation_orgdb

federicomarini commented 5 years ago

Hi Marc, Locally you can test it with the version available after commit 45ac23904a97b15e0199d8b5d100a5251b0a0169. This should trigger the appearing of the id types specific to each package - feel free to try it and get back to me :wink:

mgalland commented 5 years ago

Hello Thanks for your help and your script. I've made a slightly modified one. I'm running it locally.

dds object generation still works
annotation object also works
DE genes also works

The functional analysis bugs with the message:
Warning in library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE, : there is no package called ‘org..eg.db’ Warning: Error in eval: The packageorg..eg.dbis not installed/available. Try installing it with BiocManager::install() ?

The "apply id conversion between data and signatures" also does not work. Error message is Warning: Error in testForValidKeytype: Invalid keytype: ENSEMBL. Please use the keytypes method to see a listing of valid arguments.

My script is here.
I've also attached my session info here

Hope I'm helping and not demotivating you. Cheers Marc

federicomarini commented 5 years ago

Hey Marc, no risk of demotivating me ;) Thank you for tackling down a case I was not addressing until now.

For the first error: you might need to specify the species, even if you provided the annotation object.

For the second one: I think it is due to the current hard-coding of ENSEMBL in the id type. This could also be fixed with a similar strategy like the one for the annotation generation, making the id types depend on what is available in the org.db package.

Hold on, I'll be on it soon 😊

federicomarini commented 5 years ago

Update on using the Functional analysis tab:

the limma::goana based analysis (1st button of the three) does not work with Arabidopsis (they are non eg.db packages)
not so bad still: the topGO-based analysis (which IMHO always delivers more precise terms instead of too generic ones) is now working correctly. Remember to specify the species in the setup and reload the annotation, just to be safe You can see this in action here:

i.e. if you click on the enriched terms, you can see a small popup of the heatmap of the signature for the genes annotated to each term, detected as DE. I think it is a pretty neat way to check "what's going on". And I am not at all familiar with your setting, but seeing "water" show up in a drought vs control experiment is quite reassuring 👍

In summary: For the time being, please stick to the topGO-based analysis. I'll look at the other methods as well.

federicomarini commented 5 years ago

-> all this is implemented from 44881f460552d4519ddcbf5c34dccab6ed2473ca onwards

mgalland commented 5 years ago

Hey Federico I haven't got the functional annotation to work to be honest. But maybe I just installed the wrong version. I used devtools::install_github("federicomarini/ideal",ref = 44881f460552d4519ddcbf5c34dccab6ed2473ca) to install the version you mentioned.

I'll let Max, a colleague of mine that gave me the subsetted dataset, to try it himself as he will have to analyse RNA-Seq datasets in the future. Cheers Marc (you can close this issue for now I think!) Thanks for all the help so far.

federicomarini commented 5 years ago

Hm, strange - sorry it did not work as smooth as I hoped. Did you use the topGO-based enrichment function (as in the example, here is up& down genes)?

mgalland commented 5 years ago

No worry! I'lll try again soon as I love your app and how it helps post-sequencing analysis.
And yes, I've used the topGO enrichment.

federicomarini commented 5 years ago

Well thanks then for the kind words, and for working out with me on a case I never encountered to date. If you or Max want to follow up on this (also via email), please do so. Federico

federicomarini / ideal

Annotation not available #1