aertslab / SCENIC

SCENIC is an R package to infer Gene Regulatory Networks and cell types from single-cell RNA-seq data.
http://scenic.aertslab.org
GNU General Public License v3.0
416 stars 94 forks source link

Error in executing the motifs_AUC step #4

Closed DivyanshAgarwal closed 6 years ago

DivyanshAgarwal commented 7 years ago

Hi, I'm trying to implement SCENIC, but regardless of which dataset I use, I keep encountering an error with the following command in Step 2: motifs_AUC <- lapply(motifRankings, function(ranking) calcAUC(tfModules, ranking, aucMaxRank=0.01nrow(ranking@rankings), verbose=FALSE)) The error message I get is: Error in calcAUC(tfModules, ranking, aucMaxRank = 0.01 nrow(ranking@rankings), : Fewer than 80% of the genes in the gene sets are included in the rankings. Check wether the gene IDs in the 'rankings' and 'geneSets' match. Do you have any suggestion or advice on overcoming this problem? Thank you!

daniel-wells commented 6 years ago

Did you try checking your IDs and the ranking IDs? The IDs in the 'rankings' can be seen using head(motifRankings$`500bp`@rankings$rn) - are your IDs (head(tfModules[[1]])) in the same format? Running this: tfModules[[1]][!tfModules[[1]] %in% motifRankings$`500bp`@rankings$rn] will show you which of your geneSet IDs are not in the rankings IDs. Is it just some of the genes or do none of them overlap?

Perhaps your IDs are from a different (newer?) annotation of the genome (which organism are you using? and where are your gene IDs from?) so you might have to map them back to an older version, for example usinglibrary("biomaRt").

gaelcge commented 6 years ago

Hi guys

I have a similar issue which is due to the previous command:

Add TF to the gene set (used in the following steps, careful if editing)

tfModules <- setNames(lapply(names(tfModules), function(gsn) { tf <- strsplit(gsn, "_")[[1]][1] unique(c(tf, tfModules[[gsn]])) }), names(tfModules))

This command will indeed add the TF to the gene set but transform all gene names from the gene set into numbers, which are not therefore recognized when calculating motifs_AUC afterwards.

example:

Junb_top5perTarget [1] "Junb" "4560" "150" "8422" "4601" "4632" "4635" "4704" "323" "7672" [11] "4825" "4876" "2899" "479" "5053" "5128" "3056" "5174" "5231" "5318" [21] "5337" "5378" "7773" "8827" "871" "8529" "8839" "5658" "5681" "3432" [31] "1160" "7849" "3464" "5873" "1214" "8882" "1298" "6024" "6072" "8284" [41] "1459" "6294" "3780" "6443" "6475" "6511" "3860" "6732" "6967" "4060" [51] "4074" "2122" "7153" "7203" "8102" "9194" "7522" "7534" "2569"

I would think it is related to the warning "careful if editing" but I am not sure what to do next.

Thanks for the support, it would be very appreciated.

gaelcge commented 6 years ago

I forgot to put options(stringsAsFactors=FALSE) when running the Genie3 part of scenic (as mentioned in the tutorial), which fixed the error.

s-aibar commented 6 years ago

Yes, this error typically happens when the results from GENIE3 are converted to factors. It is mentioned in the tutorial, but multiple people got similar errors so I'll try to say it more explicitly...