DavisLaboratory / singscore

An R/Bioconductor package that implements a single-sample molecular phenotyping approach
https://davislaboratory.github.io/singscore/
40 stars 5 forks source link

Genes Not Unique Error #26

Closed DarioS closed 3 years ago

DarioS commented 3 years ago

I have a pathway which does have all gene symbols being unique, but an error happens claiming that it is not.

> pathwaysList[1]
$`ACE Inhibitor Pathway`
 [1] "ACE"     "ACE2"    "AGT"     "AGTR1"   "AGTR2"   "ATP6AP2" "BDKRB1"  "BDKRB2"  "CMA1"    "CTSG"    "CYP11B2"
[12] "KNG1"    "MAS1"    "NOS3"    "NR3C2"   "REN"     "TGFB1"
> generateNull(pathwaysList[[1]], genesRanked, B = 10000)
Error in validObject(.Object) : 
  invalid class “GeneSet” object: gene symbols must be unique

How to avoid the error?

bhuvad commented 3 years ago

Hi @DarioS,

This is a general limitation imposed by the GSEABase package (https://bioconductor.org/packages/release/bioc/html/GSEABase.html). We convert (any character list) and store gene-sets in the GeneSet data structure from said package as it ensures gene-sets follow a required format (for example, have a name and have unique genes). This package and data structure has been designed by the BioConductor core team and has specifications required for almost any gene set. The way to avoid this error is to ensure that your gene list is composed of unique genes. You can use the unique() funtion in R to do so. For example:

pathwaysList[[1]] = unique(pathwaysList[[1]])

#or for the entire group of pathways
pathwaysList = lapply(pathwaysList, unique)

Hope that helps.

Cheers, Dharmesh

DarioS commented 3 years ago

Something else is wrong. Note that my gene list did not have any duplicates if you read carefully. I can even use unique to get error.

> generateNull(unique(pathwaysList[[1]]), genesRanked, B = 10000)
Error in validObject(.Object) : 
  invalid class “GeneSet” object: gene symbols must be unique

Perhaps the duplicates are somehow created by generateNull because they are not in the user's input.

Please try yourself with:

testing <- c("ACE", "ACE2", "AGT", "AGTR1", "AGTR2", "ATP6AP2", "BDKRB1",  "BDKRB2", "CMA1", "CTSG", "CYP11B2",
             "KNG1", "MAS1", "NOS3", "NR3C2", "REN", "TGFB1")
bhuvad commented 3 years ago

Hi @DarioS,

Thanks for digging into this error and figuring out the issue! I will just log it here for future users in search for a similar answer. The issue was that the second argument needs to be named explicitly (i.e. rankData = ...) as otherwise the function assumes it is receiving a downSet value (which is by default the second parameter in the function.

generateNull(pathwaysList[[1]], rankData = genesRanked, B = 10000)

Thanks again for the in depth analysis into the code and function!

Cheers, Dharmesh