girke-lab / signatureSearch

R/Bioconductor package including the Gene Expression Signature Search (GESS), Function Enrichment Analysis (FEA) methods and supporting drug-target network construction for visualization
17 stars 4 forks source link

LINCS Database Update #9

Closed JenFisher7 closed 2 years ago

JenFisher7 commented 2 years ago

First, thank you for creating a great package!

I recently noticed that you are adding the updated LINCS database as an option for your tool, and I wanted to try out the new database. However, when I try to access the lincs2 from ExperimentHub, I am getting an error about the snapshot date.

It might relate to this issue https://github.com/Bioconductor/AnnotationHub/issues/11 .

Here is the code that I was trying to run with the error: `> library(ExperimentHub); library(rhdf5)

eh <- ExperimentHub() snapshotDate(): 2021-10-19 lincs2 <- eh[["EH7297"]] Error: EH7297 added after current Hub snapshot date. added: 2022-01-04 snapshote date: 2021-10-19`

Here is my sessioninfo: `> sessionInfo() R version 4.1.2 (2021-11-01) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 20.04.3 LTS

Matrix products: default BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.8.so

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=C LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] stats4 stats graphics grDevices utils datasets methods base

other attached packages: [1] rhdf5_2.38.0 ExperimentHub_2.2.1 AnnotationHub_3.0.2 BiocFileCache_2.2.0
[5] org.Hs.eg.db_3.14.0 AnnotationDbi_1.56.2 dplyr_1.0.7 dbplyr_2.1.1
[9] PLIER_0.99.0 qvalue_2.24.0 rsvd_1.0.5 knitr_1.37
[13] glmnet_4.1-3 Matrix_1.3-4 pheatmap_1.0.12 gplots_3.1.1
[17] RColorBrewer_1.1-2 DESeq2_1.34.0 gprofiler2_0.2.1 ggplot2_3.3.5
[21] signatureSearchData_1.8.4 signatureSearch_1.9.6 Rcpp_1.0.8 recount3_1.4.0
[25] SummarizedExperiment_1.24.0 Biobase_2.54.0 GenomicRanges_1.46.1 GenomeInfoDb_1.30.0
[29] IRanges_2.28.0 S4Vectors_0.32.3 BiocGenerics_0.40.0 MatrixGenerics_1.6.0
[33] matrixStats_0.61.0

loaded via a namespace (and not attached): [1] utf8_1.2.2 R.utils_2.11.0 tidyselect_1.1.1 RSQLite_2.2.9
[5] htmlwidgets_1.5.4 grid_4.1.2 BiocParallel_1.28.2 scatterpie_0.1.7
[9] munsell_0.5.0 codetools_0.2-18 preprocessCore_1.56.0 withr_2.4.3
[13] colorspace_2.0-2 GOSemSim_2.18.1 filelock_1.0.2 rstudioapi_0.13
[17] DOSE_3.18.3 GenomeInfoDbData_1.2.7 polyclip_1.10-0 bit64_4.0.5
[21] farver_2.1.0 downloader_0.4 vctrs_0.3.8 treeio_1.16.2
[25] generics_0.1.1 xfun_0.29 R6_2.5.1 graphlayouts_0.8.0
[29] locfit_1.5-9.4 bitops_1.0-7 rhdf5filters_1.6.0 cachem_1.0.6
[33] fgsea_1.18.0 gridGraphics_0.5-1 DelayedArray_0.20.0 assertthat_0.2.1
[37] promises_1.2.0.1 BiocIO_1.4.0 scales_1.1.1 ggraph_2.0.5
[41] enrichplot_1.12.3 gtable_0.3.0 affy_1.72.0 tidygraph_1.2.0
[45] rlang_1.0.0 genefilter_1.76.0 splines_4.1.2 rtracklayer_1.54.0
[49] lazyeval_0.2.2 BiocManager_1.30.16 yaml_2.2.2 reshape2_1.4.4
[53] httpuv_1.6.5 clusterProfiler_4.0.5 tools_4.1.2 ggplotify_0.1.0
[57] affyio_1.64.0 ellipsis_0.3.2 sessioninfo_1.2.1 plyr_1.8.6
[61] visNetwork_2.1.0 zlibbioc_1.40.0 purrr_0.3.4 RCurl_1.98-1.5
[65] viridis_0.6.2 cowplot_1.1.1 ggrepel_0.9.1 magrittr_2.0.2
[69] data.table_1.14.2 DO.db_2.9 reactome.db_1.76.0 evaluate_0.14
[73] hms_1.1.1 patchwork_1.1.1 mime_0.12 xtable_1.8-4
[77] XML_3.99-0.8 shape_1.4.6 gridExtra_2.3 compiler_4.1.2
[81] tibble_3.1.6 KernSmooth_2.23-20 crayon_1.4.2 shadowtext_0.1.1
[85] R.oo_1.24.0 htmltools_0.5.2 ggfun_0.0.5 later_1.3.0
[89] tzdb_0.2.0 tidyr_1.1.4 geneplotter_1.72.0 aplot_0.1.2
[93] DBI_1.1.2 tweenr_1.0.2 MASS_7.3-54 rappdirs_0.3.3
[97] readr_2.1.1 cli_3.1.1 R.methodsS3_1.8.1 parallel_4.1.2
[101] igraph_1.2.11 pkgconfig_2.0.3 GenomicAlignments_1.30.0 plotly_4.10.0
[105] foreach_1.5.1 ggtree_3.0.4 annotate_1.72.0 XVector_0.34.0
[109] yulab.utils_0.0.4 stringr_1.4.0 digest_0.6.29 graph_1.72.0
[113] Biostrings_2.62.0 rmarkdown_2.11 fastmatch_1.1-3 tidytree_0.3.7
[117] GSEABase_1.54.0 restfulr_0.0.13 curl_4.3.2 gtools_3.9.2
[121] shiny_1.7.1 Rsamtools_2.10.0 rjson_0.2.20 lifecycle_1.0.1
[125] nlme_3.1-153 jsonlite_1.7.3 Rhdf5lib_1.16.0 viridisLite_0.4.0
[129] limma_3.50.0 fansi_1.0.2 pillar_1.6.5 lattice_0.20-45
[133] KEGGREST_1.34.0 fastmap_1.1.0 httr_1.4.2 survival_3.2-13
[137] GO.db_3.14.0 interactiveDisplayBase_1.30.0 glue_1.6.1 iterators_1.0.13
[141] png_0.1-7 BiocVersion_3.14.0 bit_4.0.4 ggforce_0.3.3
[145] stringi_1.7.6 HDF5Array_1.20.0 blob_1.2.2 caTools_1.18.2
[149] memoise_2.0.1 ape_5.6-1 `

yduan004 commented 2 years ago

Hi @JenFisher7 , thanks for using the signatureSearch package! The reason is that currently, the LINCS2 database is only available in the devel version of ExperimentHub, it will become the release version after the upcoming Bioconductor release. So now, you need to install R devel and use devel version of Bioconductor and packages to access the LINCS2 database.

JenFisher7 commented 2 years ago

@yduan004 Thank you for responding so quickly. I used the development version of Bioconductor, and I was able to run the code above. I do have another error when I try the lincs2 example from the vignette.

here is the code

library(SummarizedExperiment); library(HDF5Array)
sample_db <- SummarizedExperiment(HDF5Array(db_path, name="assay"))
rownames(sample_db) <- HDF5Array(db_path, name="rownames")
colnames(sample_db) <- HDF5Array(db_path, name="colnames")
# get "vorinostat__SKB__trt_cp" signature drawn from toy database
query_mat <- as.matrix(assay(sample_db[,"vorinostat__SKB__trt_cp"]))
query <- as.numeric(query_mat); names(query) <- rownames(query_mat)
upset <- head(names(query[order(-query)]), 150)
head(upset)
downset <- tail(names(query[order(-query)]), 150)
head(downset)

library(signatureSearch)
data("lincs_pert_info2")
qsig_lincs2 <- qSig(query=list(upset=upset, downset=downset), 
                     gess_method="LINCS", refdb="lincs2")
lincs2 <- gess_lincs(qsig_lincs2, tau=TRUE, sortby="NCS", workers=1,
                      cmp_annot_tb=lincs_pert_info2, by="pert_id", 
                      cmp_name_col="pert_iname")

here is the error

Error in `stop_subscript()`:
! Can't subset columns that don't exist.
x Column `t_gn_sym` doesn't exist.
Run `rlang::last_error()` to see where the error occurred.
There were 21 warnings (use warnings() to see them)
> rlang::last_trace()
<error/vctrs_error_subscript_oob>
Error in `stop_subscript()`:
! Can't subset columns that don't exist.
x Column `t_gn_sym` doesn't exist.
Backtrace:
     ▆
  1. ├─signatureSearch::gess_lincs(...)
  2. │ └─signatureSearch::addGESSannot(...)
  3. │   └─res %<>% relocate(t_gn_sym:PCIDss, .after = all_of(lastcol))
  4. ├─dplyr::relocate(., t_gn_sym:PCIDss, .after = all_of(lastcol))
  5. └─dplyr:::relocate.data.frame(., t_gn_sym:PCIDss, .after = all_of(lastcol))
  6.   └─tidyselect::eval_select(expr(c(...)), .data)
  7.     └─tidyselect:::eval_select_impl(...)
  8.       ├─tidyselect:::with_subscript_errors(...)
  9.       │ ├─base::tryCatch(...)
 10.       │ │ └─base tryCatchList(expr, classes, parentenv, handlers)
 11.       │ │   └─base tryCatchOne(expr, names, parentenv, handlers[[1L]])
 12.       │ │     └─base doTryCatch(return(expr), name, parentenv, handler)
 13.       │ └─tidyselect:::instrument_base_errors(expr)
 14.       │   └─base::withCallingHandlers(...)
 15.       └─tidyselect:::vars_select_eval(...)
 16.         └─tidyselect:::walk_data_tree(expr, data_mask, context_mask)
 17.           └─tidyselect:::eval_c(expr, data_mask, context_mask)
 18.             └─tidyselect:::reduce_sels(node, data_mask, context_mask, init = init)
 19.               └─tidyselect:::walk_data_tree(new, data_mask, context_mask)
 20.                 └─tidyselect:::eval_colon(expr, data_mask, context_mask)
 21.                   └─tidyselect:::walk_data_tree(...)
 22.                     └─tidyselect:::as_indices_sel_impl(...)
 23.                       └─tidyselect:::as_indices_impl(x, vars, strict = strict)
 24.                         └─tidyselect:::chr_as_locations(x, vars)
 25.                           └─vctrs::vec_as_location(x, n = length(vars), names = vars)
 26.                             └─vctrs `<fn>`()
 27.                               └─vctrs:::stop_subscript_oob(...)
 28.                                 └─vctrs:::stop_subscript(...)
 29.                                   └─rlang::abort(...)

sessionInfo

> sessionInfo()
R Under development (unstable) (2022-01-29 r81593)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.3 LTS

Matrix products: default
BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.8.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] signatureSearchData_1.9.4   signatureSearch_1.9.6       Rcpp_1.0.8                  SummarizedExperiment_1.25.3
 [5] Biobase_2.55.0              GenomicRanges_1.47.6        GenomeInfoDb_1.31.4         HDF5Array_1.23.2           
 [9] rhdf5_2.39.5                DelayedArray_0.21.2         IRanges_2.29.1              S4Vectors_0.33.10          
[13] MatrixGenerics_1.7.0        matrixStats_0.61.0          Matrix_1.4-0                ExperimentHub_2.3.5        
[17] AnnotationHub_3.3.8         BiocFileCache_2.3.4         dbplyr_2.1.1                BiocGenerics_0.41.2        

loaded via a namespace (and not attached):
  [1] shadowtext_0.1.1              fastmatch_1.1-3               plyr_1.8.6                    igraph_1.2.11                
  [5] lazyeval_0.2.2                GSEABase_1.57.0               splines_4.2.0                 BiocParallel_1.29.12         
  [9] ggplot2_3.3.5                 digest_0.6.29                 yulab.utils_0.0.4             htmltools_0.5.2              
 [13] GOSemSim_2.21.1               viridis_0.6.2                 GO.db_3.14.0                  fansi_1.0.2                  
 [17] magrittr_2.0.2                memoise_2.0.1                 limma_3.51.3                  tzdb_0.2.0                   
 [21] readr_2.1.2                   Biostrings_2.63.1             annotate_1.73.0               graphlayouts_0.8.0           
 [25] R.utils_2.11.0                enrichplot_1.15.2             colorspace_2.0-2              blob_1.2.2                   
 [29] rappdirs_0.3.3                ggrepel_0.9.1                 dplyr_1.0.7                   crayon_1.4.2                 
 [33] RCurl_1.98-1.5                jsonlite_1.7.3                graph_1.73.0                  scatterpie_0.1.7             
 [37] ape_5.6-1                     glue_1.6.1                    polyclip_1.10-0               gtable_0.3.0                 
 [41] zlibbioc_1.41.0               XVector_0.35.0                Rhdf5lib_1.17.2               scales_1.1.1                 
 [45] DOSE_3.21.2                   DBI_1.1.2                     viridisLite_0.4.0             xtable_1.8-4                 
 [49] gridGraphics_0.5-1            tidytree_0.3.7                bit_4.0.4                     reactome.db_1.77.0           
 [53] preprocessCore_1.57.0         htmlwidgets_1.5.4             httr_1.4.2                    fgsea_1.21.0                 
 [57] RColorBrewer_1.1-2            ellipsis_0.3.2                R.methodsS3_1.8.1             pkgconfig_2.0.3              
 [61] XML_3.99-0.8                  farver_2.1.0                  utf8_1.2.2                    ggplotify_0.1.0              
 [65] tidyselect_1.1.1              rlang_1.0.0                   reshape2_1.4.4                later_1.3.0                  
 [69] AnnotationDbi_1.57.1          munsell_0.5.0                 BiocVersion_3.15.0            tools_4.2.0                  
 [73] visNetwork_2.1.0              cachem_1.0.6                  downloader_0.4                cli_3.1.1                    
 [77] generics_0.1.1                RSQLite_2.2.9                 stringr_1.4.0                 fastmap_1.1.0                
 [81] yaml_2.2.2                    ggtree_3.3.1                  bit64_4.0.5                   tidygraph_1.2.0              
 [85] purrr_0.3.4                   KEGGREST_1.35.0               ggraph_2.0.5                  nlme_3.1-155                 
 [89] mime_0.12                     R.oo_1.24.0                   aplot_0.1.2                   DO.db_2.9                    
 [93] compiler_4.2.0                rstudioapi_0.13               filelock_1.0.2                curl_4.3.2                   
 [97] png_0.1-7                     interactiveDisplayBase_1.33.0 affyio_1.65.0                 treeio_1.19.1                
[101] tibble_3.1.6                  tweenr_1.0.2                  stringi_1.7.6                 lattice_0.20-45              
[105] vctrs_0.3.8                   pillar_1.6.5                  lifecycle_1.0.1               rhdf5filters_1.7.0           
[109] BiocManager_1.30.16           data.table_1.14.2             bitops_1.0-7                  httpuv_1.6.5                 
[113] patchwork_1.1.1               qvalue_2.27.0                 affy_1.73.0                   R6_2.5.1                     
[117] promises_1.2.0.1              gridExtra_2.3                 MASS_7.3-55                   assertthat_0.2.1             
[121] withr_2.4.3                   GenomeInfoDbData_1.2.7        hms_1.1.1                     parallel_4.2.0               
[125] clusterProfiler_4.3.2         grid_4.2.0                    ggfun_0.0.5                   tidyr_1.1.4                  
[129] ggforce_0.3.3                 shiny_1.7.1 
yduan004 commented 2 years ago

Sorry, it is an overlook in the package function. Thanks for pointing it out! The cmp_annot_tb should not contain column names of t_gn_sym or PCIDss, these two columns will be generated internally by the gess_* functions and thus conserved by the package, if user want to maintain these two columns in their provided annotation table, give them different names. I will debug this and update the package. As a temporary solution, please try running

lincs3 <- gess_lincs(qSig=qsig_lincs2, tau=TRUE, sortby="NCS", workers=2,
                   cmp_annot_tb=dplyr::select(lincs_pert_info2, -"t_gn_sym"), by="pert_id",                        
                   cmp_name_col="pert_iname")

or rename the t_gn_sym column in cmp_annot_tb, but you do not need to do this if the loaded lincs_pert_info2 table is used as compound annotation table since the t_gn_sym column will be exactly the same with the t_gn_sym column that will be generated by the gess_lincs function.

JenFisher7 commented 2 years ago

Thank you! That solution worked.

I also quickly checked the output for some cell lines from the new LINCS database that I am interested in (SHSY5Y, LN229, SKNSH, U251MG, YH13, & GI1). However, I don't see them in the results

code:

lincs3 <- gess_lincs(qSig=qsig_lincs2, tau=TRUE, sortby="NCS", workers=2,
                     cmp_annot_tb=dplyr::select(lincs_pert_info2, -"t_gn_sym"), by="pert_id",                        
                     cmp_name_col="pert_iname")
lincs3_res<- result(lincs3)
table(lincs3_res$cell)
    A375     A549      AGS      ASC     BJAB     BT20     CD34    H1299     HA1E     HAP1     HBL1   HCC515   HCT116 
   11521    11874      303     2486      136      148      209      307     6794       90      136     5761      327 
  HEK293  HEK293T     HELA    HEPG2     HL60     HME1    HPTEC   HS578T     HT29     HUH7    HUVEC   JURKAT     K562 
    1323        2     1685     5240       61      116       56      198    10415      732      657     1182      136 
   KMS34    LNCAP   MCF10A     MCF7 MDAMB231     MINO    NALM6 NCIH2073  NCIH508  NCIH596      NEU    NKDBA    NOMO1 
     136       59     1772    11591     1184      100      136      329      260      310     1998      227      256 
     NPC  OCILY10  OCILY19   OCILY3    P1A82      PC3      PHH   SHSY5Y      SKB    SKBR3      SKL   SKMEL5    SW480 
    3628       92      136      136       83    11855     1662      169     2123      202      169      136      280 
    THP1     TMD8     U2OS     U937     VCAP     YAPC 
    1376      136    16347      341    15640     1696 

I see the cell lines in the cell_info2 object

# Load object
data(cell_info2)
head(cell_info2)
yduan004 commented 2 years ago

It might be the case since the LINCS2 database curated have the filtering process. The CLUE LINCS group defines exemplars as the GESs with the highest Transcriptional Activity Score (TAS) among a set of signatures obtained with the same perturbagen but different dosage and treatment time combinations. By choosing only the best performing GESs (i.e. with highest TAS) unnecessary redundancies in the LINCS2 database can be substantially reduced. The LINCS2 database we generated are the the exact same exemplar signatures used by LINCS’ online query tool. This database contains 136,460 GESs obtained with a total of 30,456 drug perturbations each tested in up to 58 cell lines. Maybe the cell types you are interested in are filtered out since they are marked as non-exemplar? Maybe be can try looking at other cell types that are in the same tissue as your interested cell types?