Open joel-tuberosa opened 2 years ago
I encountered the same problem, and after reviewing the source code I found that the problem was "motif_rankings-mm9-tss-centered-5kb-10species.mc9nr.genes_vs_motifs.rankings.feather". It has a lot of duplicate motif names in it. The solution is to use the old file named "mm9-tss-centered-10kb-10species.mc9nr.feather" from https://resources.aertslab.org/cistarget/databases/old/mus_musculus/mm9/refseq_r45/mc9nr/gene_based/
@joel-tuberosa, did you get anywhere with this other than changing to an old annotation file? I am having the exact same error.
I just changed to the old file and then everything went well
发自我的iPhone
------------------ Original ------------------ From: DavidS @.> Date: Thu,Apr 20,2023 5:50 AM To: aertslab/RcisTarget @.> Cc: ZYT-ZhangYunTao19941116 @.>, Comment @.> Subject: Re: [aertslab/RcisTarget] RcisTarget::addSignificantGenes error(Issue #27)
@joel-tuberosa, did you get anywhere with this other than changing to an old annotation file? I am having the exact same error.
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>
I think I know what is the problem: the new and old version of the databases have a the column where the names of the motifs are stored in differen positions. In old databases it is the first position (colum name 'features', while in the new ones it is at the end (column name 'motifs').
Unfortunately, the code for 03_addSignificantGenes.R
assumes that the first column contains the motif names (my comments):
.getSignificantGenes <- function(geneSet,
rankings,
signifRankingNames=NULL,
method="iCisTarget",
maxRank=5000,
plotCurve=FALSE,
genesFormat=c("geneList", "incidMatrix"),
nCores=1,
digits=3,
nMean=50)
{...
# the motifRankings S4 object becomes a dataframe
rankings <- getRanking(rankings)
# the 'indices' are obtained from the FIRST column!!!
indexCol <- colnames(rankings)[1]
...
# this will give you now a series of ranking values, as character... and not necessarily unique
motifNames <- as.character(unlist(rankings[,indexCol]))
# now you get repeated row.names as you have a list of numbers instead of unique motif names:
gSetRanks <- data.frame(row.names=motifNames, rankings[,geneSet])
# and this is where the error originates
...
}
I think this was intended to be handled before, within importRankings
, where it does:
indexCol <- intersect(allColumns, c('motifs', 'tracks', 'features'))# [1]
if(verbose) message("Using the column '", indexCol, "' as feature index for the ranking database.")
So in principle it is independent of position, but indexCol
is not passed on to cisTarget
, I think, and also it is clear from the comment that the motifName information is expected to be at the beginning of the dataframe.
However, I do not get the intended results from this message when I run importRankings
. I have been using the Drosophila motifRankings, both "new" and "old".
When I import them I get, with the old, the expected message:
> motifRankings_old <- importRankings("resources/motifdbs/old/dm6-5kb-upstream-full-tx-11species.mc8nr.feather")
Using the column 'features' as feature index for the ranking database.
But with the new, I get:
> motifRankings_new <- importRankings(".../.../dm6-5kb-upstream-full-tx-11species.mc8nr.genes_vs_motifs.rankings.feather")
Using the column '128up' as feature index for the ranking database.
'128up' is the name of the first Drosophila gene by alphanumeric ordering... but this cannot be the result of intersect(allColumns, c('motifs', 'tracks', 'features'))
... I must be missing something ¯\_(ツ)_/¯
Anyway, the solution is to place the last column of the new database at the beginning before running cisTarget
:
motifRankings_new@rankings <- dplyr::relocate(motifRankings_new@rankings, motifs)
Hope this helps.
Anyway, the solution is to place the last column of the new database at the beginning before running
cisTarget
:motifRankings_new@rankings <- dplyr::relocate(motifRankings_new@rankings, motifs)
This does it! Thanks for the advice!
Hello,
I would like to perform an enrichment analysis with the following data:
target_genes
- a vector of gene names corresponding to the tested setmotif_rankings
- the loaded databasemm9-tss-centered-5kb-10species.mc9nr.genes_vs_motifs.rankings.feather
downloaded from heremotifAnnotations_mgi
- annotation data loaded from the package withdata(motifAnnotations_mgi)
I am running the following commands:
And I got this error message from the last command:
Do you have an idea how to fix this?
Thank you in advance.
Joël