LauraPS1 / TFEA.ChIP_downloads

TFEA.ChIP is an R package in developement. Its purpose is to analyze transcription factor enrichment in a set of differentially expressed genes.
5 stars 1 forks source link

GSEA_run error, differing number of rows from chip index #7

Closed lizzyjoan closed 2 years ago

lizzyjoan commented 2 years ago

Hello, We've been running into an error while using your package and hoped you could help. When running the GSEA_run() function, we keep getting this error

Error in data.frame(Accession = chip_index$Accession, TF = chip_index$TF,  : arguments imply differing number of rows: 755, 585

Even when using the toy data from the vignette, if get_chip_index(encodeFilter = TRUE) or other sets of TF's other than the 3 listed in the vignette, we still get this error (just with varying numbers for the differing number of rows.)

A labmate tracked down the issue to line 1317 here of the GSEA_run function, where it's trying to build a dataframe from calculated variables just above, and one does not have the same dimensions as the others, which we believe is leading to this error.

Please let me know if you have any suggestions or need any other information. Thank you for your time!

installed.packages()[names(sessionInfo()$otherPkgs), "Version"]
           ggplot2
           "3.3.6"
EnsDb.Hsapiens.v79
          "2.99.0"
         ensembldb
          "2.21.1"
  AnnotationFilter
          "1.21.0"
   GenomicFeatures
          "1.49.5"
     GenomicRanges
          "1.49.0"
      GenomeInfoDb
          "1.33.3"
      org.Mm.eg.db
          "3.15.0"
     AnnotationDbi
          "1.59.1"
           IRanges
          "2.31.0"
         S4Vectors
          "0.35.1"
           Biobase
          "2.57.1"
      BiocGenerics
          "0.43.0"
            tibble
           "3.1.7"
             dplyr
           "1.0.9"
           biomaRt
          "2.53.2"
       BiocManager
         "1.30.18"
         TFEA.ChIP
          "1.17.0"
> R.Version()
$platform
[1] "x86_64-w64-mingw32"
$arch
[1] "x86_64"
$os
[1] "mingw32"
$crt
[1] "ucrt"
$system
[1] "x86_64, mingw32"
$status
[1] ""
$major
[1] "4"
$minor
[1] "2.0"
$year
[1] "2022"
$month
[1] "04"
$day
[1] "22"
$`svn rev`
[1] "82229"
$language
[1] "R"
$version.string
[1] "R version 4.2.0 (2022-04-22 ucrt)"
$nickname
[1] "Vigorous Calisthenics" 
LauraPS1 commented 2 years ago

Hi, I've been trying to reproduce the error with different combinations of TFs and databases but so far I've not been able to, so more context would be appreciated.

Are you using the default database, or one of the larger ones available in the downloads repository? is GSEA_run() being run inside a function?

This error is usually triggered when there's a difference between the database chip_index was built with and the database available in GSEA_run()'s environment. All the functions in TFEA.ChIP that require to use the ChIP-gene databases will expect them to be defined in the global environment, and if not, use the internal database of the package (which is considerably smaller due to the space limits in Bioconductor). This differences in environments often happen when running TFEA.ChIP inside another function, and that's why we added the function set_user_data(), that assigns a ChIP-gene database and associated metadata to the global environment.

An example of the difference, let's define two functions:

example_Function <- function( tf ){
    # load large DB
    load("ReMap2022+EnsTSS+GH.Rdata")    
    nrow( get_chip_index( TFfilter =  tf, encodeFilter = T ) )
}

example_Function_2 <- function( tf ){
    # load large DB
    load("ReMap2022+EnsTSS+GH.Rdata")

    set_user_data( MetaData, ChIPDB )    # assign objects to the global envir.

    nrow( get_chip_index( TFfilter =  tf, encodeFilter = T ) )
}

When we run the two:

> example_Function( tf=c( "ATF1", "CTCF" ) )
[1] 32
> example_Function_2( tf=c( "ATF1", "CTCF" ) )
[1] 282

Please get back to me if this isn't the issue you're having. Cheers

lizzyjoan commented 2 years ago

Thank you for all your help and suggestions! It looks like it's working. I appreciate it!