Closed JenFisher7 closed 2 years ago
Hi @JenFisher7 , thanks for using the signatureSearch package! The reason is that currently, the LINCS2 database is only available in the devel version of ExperimentHub, it will become the release version after the upcoming Bioconductor release. So now, you need to install R devel and use devel version of Bioconductor and packages to access the LINCS2 database.
@yduan004 Thank you for responding so quickly. I used the development version of Bioconductor, and I was able to run the code above. I do have another error when I try the lincs2 example from the vignette.
here is the code
library(SummarizedExperiment); library(HDF5Array)
sample_db <- SummarizedExperiment(HDF5Array(db_path, name="assay"))
rownames(sample_db) <- HDF5Array(db_path, name="rownames")
colnames(sample_db) <- HDF5Array(db_path, name="colnames")
# get "vorinostat__SKB__trt_cp" signature drawn from toy database
query_mat <- as.matrix(assay(sample_db[,"vorinostat__SKB__trt_cp"]))
query <- as.numeric(query_mat); names(query) <- rownames(query_mat)
upset <- head(names(query[order(-query)]), 150)
head(upset)
downset <- tail(names(query[order(-query)]), 150)
head(downset)
library(signatureSearch)
data("lincs_pert_info2")
qsig_lincs2 <- qSig(query=list(upset=upset, downset=downset),
gess_method="LINCS", refdb="lincs2")
lincs2 <- gess_lincs(qsig_lincs2, tau=TRUE, sortby="NCS", workers=1,
cmp_annot_tb=lincs_pert_info2, by="pert_id",
cmp_name_col="pert_iname")
here is the error
Error in `stop_subscript()`:
! Can't subset columns that don't exist.
x Column `t_gn_sym` doesn't exist.
Run `rlang::last_error()` to see where the error occurred.
There were 21 warnings (use warnings() to see them)
> rlang::last_trace()
<error/vctrs_error_subscript_oob>
Error in `stop_subscript()`:
! Can't subset columns that don't exist.
x Column `t_gn_sym` doesn't exist.
Backtrace:
▆
1. ├─signatureSearch::gess_lincs(...)
2. │ └─signatureSearch::addGESSannot(...)
3. │ └─res %<>% relocate(t_gn_sym:PCIDss, .after = all_of(lastcol))
4. ├─dplyr::relocate(., t_gn_sym:PCIDss, .after = all_of(lastcol))
5. └─dplyr:::relocate.data.frame(., t_gn_sym:PCIDss, .after = all_of(lastcol))
6. └─tidyselect::eval_select(expr(c(...)), .data)
7. └─tidyselect:::eval_select_impl(...)
8. ├─tidyselect:::with_subscript_errors(...)
9. │ ├─base::tryCatch(...)
10. │ │ └─base tryCatchList(expr, classes, parentenv, handlers)
11. │ │ └─base tryCatchOne(expr, names, parentenv, handlers[[1L]])
12. │ │ └─base doTryCatch(return(expr), name, parentenv, handler)
13. │ └─tidyselect:::instrument_base_errors(expr)
14. │ └─base::withCallingHandlers(...)
15. └─tidyselect:::vars_select_eval(...)
16. └─tidyselect:::walk_data_tree(expr, data_mask, context_mask)
17. └─tidyselect:::eval_c(expr, data_mask, context_mask)
18. └─tidyselect:::reduce_sels(node, data_mask, context_mask, init = init)
19. └─tidyselect:::walk_data_tree(new, data_mask, context_mask)
20. └─tidyselect:::eval_colon(expr, data_mask, context_mask)
21. └─tidyselect:::walk_data_tree(...)
22. └─tidyselect:::as_indices_sel_impl(...)
23. └─tidyselect:::as_indices_impl(x, vars, strict = strict)
24. └─tidyselect:::chr_as_locations(x, vars)
25. └─vctrs::vec_as_location(x, n = length(vars), names = vars)
26. └─vctrs `<fn>`()
27. └─vctrs:::stop_subscript_oob(...)
28. └─vctrs:::stop_subscript(...)
29. └─rlang::abort(...)
sessionInfo
> sessionInfo()
R Under development (unstable) (2022-01-29 r81593)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.3 LTS
Matrix products: default
BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.8.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] signatureSearchData_1.9.4 signatureSearch_1.9.6 Rcpp_1.0.8 SummarizedExperiment_1.25.3
[5] Biobase_2.55.0 GenomicRanges_1.47.6 GenomeInfoDb_1.31.4 HDF5Array_1.23.2
[9] rhdf5_2.39.5 DelayedArray_0.21.2 IRanges_2.29.1 S4Vectors_0.33.10
[13] MatrixGenerics_1.7.0 matrixStats_0.61.0 Matrix_1.4-0 ExperimentHub_2.3.5
[17] AnnotationHub_3.3.8 BiocFileCache_2.3.4 dbplyr_2.1.1 BiocGenerics_0.41.2
loaded via a namespace (and not attached):
[1] shadowtext_0.1.1 fastmatch_1.1-3 plyr_1.8.6 igraph_1.2.11
[5] lazyeval_0.2.2 GSEABase_1.57.0 splines_4.2.0 BiocParallel_1.29.12
[9] ggplot2_3.3.5 digest_0.6.29 yulab.utils_0.0.4 htmltools_0.5.2
[13] GOSemSim_2.21.1 viridis_0.6.2 GO.db_3.14.0 fansi_1.0.2
[17] magrittr_2.0.2 memoise_2.0.1 limma_3.51.3 tzdb_0.2.0
[21] readr_2.1.2 Biostrings_2.63.1 annotate_1.73.0 graphlayouts_0.8.0
[25] R.utils_2.11.0 enrichplot_1.15.2 colorspace_2.0-2 blob_1.2.2
[29] rappdirs_0.3.3 ggrepel_0.9.1 dplyr_1.0.7 crayon_1.4.2
[33] RCurl_1.98-1.5 jsonlite_1.7.3 graph_1.73.0 scatterpie_0.1.7
[37] ape_5.6-1 glue_1.6.1 polyclip_1.10-0 gtable_0.3.0
[41] zlibbioc_1.41.0 XVector_0.35.0 Rhdf5lib_1.17.2 scales_1.1.1
[45] DOSE_3.21.2 DBI_1.1.2 viridisLite_0.4.0 xtable_1.8-4
[49] gridGraphics_0.5-1 tidytree_0.3.7 bit_4.0.4 reactome.db_1.77.0
[53] preprocessCore_1.57.0 htmlwidgets_1.5.4 httr_1.4.2 fgsea_1.21.0
[57] RColorBrewer_1.1-2 ellipsis_0.3.2 R.methodsS3_1.8.1 pkgconfig_2.0.3
[61] XML_3.99-0.8 farver_2.1.0 utf8_1.2.2 ggplotify_0.1.0
[65] tidyselect_1.1.1 rlang_1.0.0 reshape2_1.4.4 later_1.3.0
[69] AnnotationDbi_1.57.1 munsell_0.5.0 BiocVersion_3.15.0 tools_4.2.0
[73] visNetwork_2.1.0 cachem_1.0.6 downloader_0.4 cli_3.1.1
[77] generics_0.1.1 RSQLite_2.2.9 stringr_1.4.0 fastmap_1.1.0
[81] yaml_2.2.2 ggtree_3.3.1 bit64_4.0.5 tidygraph_1.2.0
[85] purrr_0.3.4 KEGGREST_1.35.0 ggraph_2.0.5 nlme_3.1-155
[89] mime_0.12 R.oo_1.24.0 aplot_0.1.2 DO.db_2.9
[93] compiler_4.2.0 rstudioapi_0.13 filelock_1.0.2 curl_4.3.2
[97] png_0.1-7 interactiveDisplayBase_1.33.0 affyio_1.65.0 treeio_1.19.1
[101] tibble_3.1.6 tweenr_1.0.2 stringi_1.7.6 lattice_0.20-45
[105] vctrs_0.3.8 pillar_1.6.5 lifecycle_1.0.1 rhdf5filters_1.7.0
[109] BiocManager_1.30.16 data.table_1.14.2 bitops_1.0-7 httpuv_1.6.5
[113] patchwork_1.1.1 qvalue_2.27.0 affy_1.73.0 R6_2.5.1
[117] promises_1.2.0.1 gridExtra_2.3 MASS_7.3-55 assertthat_0.2.1
[121] withr_2.4.3 GenomeInfoDbData_1.2.7 hms_1.1.1 parallel_4.2.0
[125] clusterProfiler_4.3.2 grid_4.2.0 ggfun_0.0.5 tidyr_1.1.4
[129] ggforce_0.3.3 shiny_1.7.1
Sorry, it is an overlook in the package function. Thanks for pointing it out! The cmp_annot_tb
should not contain column names of t_gn_sym
or PCIDss
, these two columns will be generated internally by the gess_*
functions and thus conserved by the package, if user want to maintain these two columns in their provided annotation table, give them different names. I will debug this and update the package. As a temporary solution, please try running
lincs3 <- gess_lincs(qSig=qsig_lincs2, tau=TRUE, sortby="NCS", workers=2,
cmp_annot_tb=dplyr::select(lincs_pert_info2, -"t_gn_sym"), by="pert_id",
cmp_name_col="pert_iname")
or rename the t_gn_sym
column in cmp_annot_tb
, but you do not need to do this if the loaded lincs_pert_info2
table is used as compound annotation table since the t_gn_sym
column will be exactly the same with the t_gn_sym
column that will be generated by the gess_lincs
function.
Thank you! That solution worked.
I also quickly checked the output for some cell lines from the new LINCS database that I am interested in (SHSY5Y, LN229, SKNSH, U251MG, YH13, & GI1). However, I don't see them in the results
code:
lincs3 <- gess_lincs(qSig=qsig_lincs2, tau=TRUE, sortby="NCS", workers=2,
cmp_annot_tb=dplyr::select(lincs_pert_info2, -"t_gn_sym"), by="pert_id",
cmp_name_col="pert_iname")
lincs3_res<- result(lincs3)
table(lincs3_res$cell)
A375 A549 AGS ASC BJAB BT20 CD34 H1299 HA1E HAP1 HBL1 HCC515 HCT116
11521 11874 303 2486 136 148 209 307 6794 90 136 5761 327
HEK293 HEK293T HELA HEPG2 HL60 HME1 HPTEC HS578T HT29 HUH7 HUVEC JURKAT K562
1323 2 1685 5240 61 116 56 198 10415 732 657 1182 136
KMS34 LNCAP MCF10A MCF7 MDAMB231 MINO NALM6 NCIH2073 NCIH508 NCIH596 NEU NKDBA NOMO1
136 59 1772 11591 1184 100 136 329 260 310 1998 227 256
NPC OCILY10 OCILY19 OCILY3 P1A82 PC3 PHH SHSY5Y SKB SKBR3 SKL SKMEL5 SW480
3628 92 136 136 83 11855 1662 169 2123 202 169 136 280
THP1 TMD8 U2OS U937 VCAP YAPC
1376 136 16347 341 15640 1696
I see the cell lines in the cell_info2 object
# Load object
data(cell_info2)
head(cell_info2)
It might be the case since the LINCS2 database curated have the filtering process. The CLUE LINCS group defines exemplars as the GESs with the highest Transcriptional Activity Score (TAS) among a set of signatures obtained with the same perturbagen but different dosage and treatment time combinations. By choosing only the best performing GESs (i.e. with highest TAS) unnecessary redundancies in the LINCS2 database can be substantially reduced. The LINCS2 database we generated are the the exact same exemplar signatures used by LINCS’ online query tool. This database contains 136,460 GESs obtained with a total of 30,456 drug perturbations each tested in up to 58 cell lines. Maybe the cell types you are interested in are filtered out since they are marked as non-exemplar? Maybe be can try looking at other cell types that are in the same tissue as your interested cell types?
First, thank you for creating a great package!
I recently noticed that you are adding the updated LINCS database as an option for your tool, and I wanted to try out the new database. However, when I try to access the lincs2 from ExperimentHub, I am getting an error about the snapshot date.
It might relate to this issue https://github.com/Bioconductor/AnnotationHub/issues/11 .
Here is the code that I was trying to run with the error: `> library(ExperimentHub); library(rhdf5)
Here is my sessioninfo: `> sessionInfo() R version 4.1.2 (2021-11-01) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 20.04.3 LTS
Matrix products: default BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.8.so
locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=C LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages: [1] stats4 stats graphics grDevices utils datasets methods base
other attached packages: [1] rhdf5_2.38.0 ExperimentHub_2.2.1 AnnotationHub_3.0.2 BiocFileCache_2.2.0
[5] org.Hs.eg.db_3.14.0 AnnotationDbi_1.56.2 dplyr_1.0.7 dbplyr_2.1.1
[9] PLIER_0.99.0 qvalue_2.24.0 rsvd_1.0.5 knitr_1.37
[13] glmnet_4.1-3 Matrix_1.3-4 pheatmap_1.0.12 gplots_3.1.1
[17] RColorBrewer_1.1-2 DESeq2_1.34.0 gprofiler2_0.2.1 ggplot2_3.3.5
[21] signatureSearchData_1.8.4 signatureSearch_1.9.6 Rcpp_1.0.8 recount3_1.4.0
[25] SummarizedExperiment_1.24.0 Biobase_2.54.0 GenomicRanges_1.46.1 GenomeInfoDb_1.30.0
[29] IRanges_2.28.0 S4Vectors_0.32.3 BiocGenerics_0.40.0 MatrixGenerics_1.6.0
[33] matrixStats_0.61.0
loaded via a namespace (and not attached): [1] utf8_1.2.2 R.utils_2.11.0 tidyselect_1.1.1 RSQLite_2.2.9
[5] htmlwidgets_1.5.4 grid_4.1.2 BiocParallel_1.28.2 scatterpie_0.1.7
[9] munsell_0.5.0 codetools_0.2-18 preprocessCore_1.56.0 withr_2.4.3
[13] colorspace_2.0-2 GOSemSim_2.18.1 filelock_1.0.2 rstudioapi_0.13
[17] DOSE_3.18.3 GenomeInfoDbData_1.2.7 polyclip_1.10-0 bit64_4.0.5
[21] farver_2.1.0 downloader_0.4 vctrs_0.3.8 treeio_1.16.2
[25] generics_0.1.1 xfun_0.29 R6_2.5.1 graphlayouts_0.8.0
[29] locfit_1.5-9.4 bitops_1.0-7 rhdf5filters_1.6.0 cachem_1.0.6
[33] fgsea_1.18.0 gridGraphics_0.5-1 DelayedArray_0.20.0 assertthat_0.2.1
[37] promises_1.2.0.1 BiocIO_1.4.0 scales_1.1.1 ggraph_2.0.5
[41] enrichplot_1.12.3 gtable_0.3.0 affy_1.72.0 tidygraph_1.2.0
[45] rlang_1.0.0 genefilter_1.76.0 splines_4.1.2 rtracklayer_1.54.0
[49] lazyeval_0.2.2 BiocManager_1.30.16 yaml_2.2.2 reshape2_1.4.4
[53] httpuv_1.6.5 clusterProfiler_4.0.5 tools_4.1.2 ggplotify_0.1.0
[57] affyio_1.64.0 ellipsis_0.3.2 sessioninfo_1.2.1 plyr_1.8.6
[61] visNetwork_2.1.0 zlibbioc_1.40.0 purrr_0.3.4 RCurl_1.98-1.5
[65] viridis_0.6.2 cowplot_1.1.1 ggrepel_0.9.1 magrittr_2.0.2
[69] data.table_1.14.2 DO.db_2.9 reactome.db_1.76.0 evaluate_0.14
[73] hms_1.1.1 patchwork_1.1.1 mime_0.12 xtable_1.8-4
[77] XML_3.99-0.8 shape_1.4.6 gridExtra_2.3 compiler_4.1.2
[81] tibble_3.1.6 KernSmooth_2.23-20 crayon_1.4.2 shadowtext_0.1.1
[85] R.oo_1.24.0 htmltools_0.5.2 ggfun_0.0.5 later_1.3.0
[89] tzdb_0.2.0 tidyr_1.1.4 geneplotter_1.72.0 aplot_0.1.2
[93] DBI_1.1.2 tweenr_1.0.2 MASS_7.3-54 rappdirs_0.3.3
[97] readr_2.1.1 cli_3.1.1 R.methodsS3_1.8.1 parallel_4.1.2
[101] igraph_1.2.11 pkgconfig_2.0.3 GenomicAlignments_1.30.0 plotly_4.10.0
[105] foreach_1.5.1 ggtree_3.0.4 annotate_1.72.0 XVector_0.34.0
[109] yulab.utils_0.0.4 stringr_1.4.0 digest_0.6.29 graph_1.72.0
[113] Biostrings_2.62.0 rmarkdown_2.11 fastmatch_1.1-3 tidytree_0.3.7
[117] GSEABase_1.54.0 restfulr_0.0.13 curl_4.3.2 gtools_3.9.2
[121] shiny_1.7.1 Rsamtools_2.10.0 rjson_0.2.20 lifecycle_1.0.1
[125] nlme_3.1-153 jsonlite_1.7.3 Rhdf5lib_1.16.0 viridisLite_0.4.0
[129] limma_3.50.0 fansi_1.0.2 pillar_1.6.5 lattice_0.20-45
[133] KEGGREST_1.34.0 fastmap_1.1.0 httr_1.4.2 survival_3.2-13
[137] GO.db_3.14.0 interactiveDisplayBase_1.30.0 glue_1.6.1 iterators_1.0.13
[141] png_0.1-7 BiocVersion_3.14.0 bit_4.0.4 ggforce_0.3.3
[145] stringi_1.7.6 HDF5Array_1.20.0 blob_1.2.2 caTools_1.18.2
[149] memoise_2.0.1 ape_5.6-1 `