girke-lab / signatureSearch

R/Bioconductor package including the Gene Expression Signature Search (GESS), Function Enrichment Analysis (FEA) methods and supporting drug-target network construction for visualization
17 stars 4 forks source link

Error when using CRISPR (or non-compound) data #12

Closed adomingues closed 1 year ago

adomingues commented 1 year ago

Hi all,

I was trying to use signatureSearch to search for gene expression signature queyring my data against the crispr LINCS data (beta release, https://clue.io/releases/data-dashboard). Since as far as I am aware the LINCS crispr signatures are not yet packaged in ExperimentHub, I download the data from clui.io and using the signatureSearchData instructions created the HDF5 file to query against.

library("data.table")
library("HDF5Array")
library("cmapR")
library("signatureSearch")

siginfo_beta <- fread("siginfo_beta.txt")

## filter crispr only
DBpath_trt_xpr <- here::here("data/cmap/build/lincs2_trt_xpr.h5")

trt_xpr_filter <- siginfo_beta[pert_type %in% c("trt_xpr")] 
trt_xpr_filter[,list(n=.N), by = "pert_id"]
trt_xpr_filter[cell_iname == "U251MG" & pert_id == "BRDN0001586899"]

new_cid <- paste(trt_xpr_filter$pert_id, trt_xpr_filter$cell_iname, trt_xpr_filter$pert_type, sep="__")

gctx2h5(
  here::here("level5_beta_trt_xpr_n142901x12328.gctx"),
  cid=trt_xpr_filter$sig_id,
  new_cid=new_cid,
  h5file=DBpath_trt_xpr,
  by_ncol=10000,
  overwrite=TRUE
)

So far so good - I think. The issue is when I try to query my gene sets against this database:

lup <- filtered_sets[["up-regulated"]][[1]]
ldown <- filtered_sets[["down-regulated"]][[1]]
qsig_lincs2 <- qSig(
  query = list(
    upset=lup,
    downset=ldown),
  gess_method = "LINCS", refdb = DBpath_trt_xpr
)

lincs2 <- gess_lincs(
  qsig_lincs2,
  tau = FALSE,
  sortby = "NCS",
  workers = 5
)

Which generates an error:

Error in `left_join()`:
! Join columns must be present in data.
✖ Problem with `drug_name`.
Run `rlang::last_error()` to see where the error occurred.

rlang::last_error()
<error/rlang_error>
Error in `left_join()`:
! Join columns must be present in data.
✖ Problem with `drug_name`.
---
Backtrace:
 1. signatureSearch::gess_lincs(...)
 5. dplyr:::left_join.data.frame(., target, by = join_cols)
Run `rlang::last_trace()` to see the full context.

So I guess the issue is that the crispr data, level5_beta_trt_xpr_n142901x12328.gctx, is missing an hardcoded column.

The question is how to get around this? I also tried with other non-compound signatures and always run into some sort of issues. Using the compound signature data is fine.

Thanks!

brendangongol commented 1 year ago

To Whom It may Concern:

Thank you for bringing this to my attention. The problem you are encountering is that when run on the default settings, gess_lincs() attempts to annotate the results with information about lincs drugs identified in the screen. This works when querying against compounds yet not against the other lincs databases since the drug names are not present in the other databases. In order to get around this issue, include the "annotation=FALSE" argument in the function call and also set "tau=FALSE". The following code should work for you:

devtools::install_github("girke-lab/signatureSearch") qsig_lincs <- qSig(query = list(upset=upset, downset=downset), gess_method="LINCS", refdb= lincs_pathxpr) lincs <- gess_lincs(qSig = qsig_lincs, sortby="NCS", tau=FALSE, workers=1, addAnnotations = FALSE)

Best Regards, Brendan

On Thu, Dec 1, 2022 at 1:13 PM A. Domingues @.***> wrote:

Hi all,

I was trying to use signatureSearch to search for gene expression signature queyring my data against the crispr LINCS data (beta release, https://clue.io/releases/data-dashboard). Since as far as I am aware the LINCS crispr signatures are not yet packaged in ExperimentHub, I download the data from clui.io and using the signatureSearchData instructions https://www.bioconductor.org/packages/release/data/experiment/vignettes/signatureSearchData/inst/doc/signatureSearchData.html#521_Download_Level_5_Data created the HDF5 file to query against.

library("data.table")

library("HDF5Array")

library("cmapR")

library("signatureSearch")

siginfo_beta <- fread("siginfo_beta.txt")

filter crispr only

DBpath_trt_xpr <- here::here("data/cmap/build/lincs2_trt_xpr.h5")

trt_xpr_filter <- siginfo_beta[pert_type %in% c("trt_xpr")] trt_xpr_filter[,list(n=.N), by = "pert_id"] trt_xpr_filter[cell_iname == "U251MG" & pert_id == "BRDN0001586899"]

new_cid <- paste(trt_xpr_filter$pert_id, trt_xpr_filter$cell_iname, trt_xpr_filter$pert_type, sep="__")

gctx2h5(

here::here("level5_beta_trt_xpr_n142901x12328.gctx"),

cid=trt_xpr_filter$sig_id,

new_cid=new_cid,

h5file=DBpath_trt_xpr,

by_ncol=10000,

overwrite=TRUE

)

So far so good - I think. The issue is when I try to query my gene sets against this database:

lup <- filtered_sets[["up-regulated"]][[1]] ldown <- filtered_sets[["down-regulated"]][[1]] qsig_lincs2 <- qSig(

query = list(

upset=lup,

downset=ldown),

gess_method = "LINCS", refdb = DBpath_trt_xpr

)

lincs2 <- gess_lincs(

qsig_lincs2,

tau = FALSE,

sortby = "NCS",

workers = 5

)

Which generates an error:

Error in left_join(): ! Join columns must be present in data.

✖ Problem with drug_name. Run rlang::last_error() to see where the error occurred.

rlang::last_error() <error/rlang_error> Error in left_join(): ! Join columns must be present in data.

✖ Problem with drug_name.

Backtrace:

  1. signatureSearch::gess_lincs(...)

  2. dplyr:::left_join.data.frame(., target, by = join_cols) Run rlang::last_trace() to see the full context.

So I guess the issue is that the crispr data, level5_beta_trt_xpr_n142901x12328.gctx, is missing an hardcoded column.

The question is how to get around this? I also tried with other non-compound signatures and always run into some sort of issues. Using the compound signature data is fine.

Thanks!

— Reply to this email directly, view it on GitHub https://github.com/girke-lab/signatureSearch/issues/12, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB6W7HVTU3QPGUJFZ4MCIJDWLEIJDANCNFSM6AAAAAASRHD5O4 . You are receiving this because you are subscribed to this thread.Message ID: @.***>

adomingues commented 1 year ago

Thanks @brendangongol! I got it fixed as soon as you replied but not now remembered to reply.

brendangongol commented 1 year ago

To Whom it May Concern:

I just did a fresh install of the signatureSearch package from github and bioconductor and both installation sources work for me. Can you try re-installing the package and then let me know if it works for you? The following code works for me:

devtools::install_github("girke-lab/signatureSearch") BiocManager::install("signatureSearch") library(signatureSearch) library(ExperimentHub); library(rhdf5) eh <- ExperimentHub() cmap <- eh[["EH3223"]]; cmap_expr <- eh[["EH3224"]] lincs <- eh[["EH3226"]]; lincs_expr <- eh[["EH3227"]] lincs2 <- eh[["EH7297"]] h5ls(lincs2)

db_path <- system.file("extdata", "sample_db.h5", package = "signatureSearch")

Load sample_db as SummarizedExperiment object

library(SummarizedExperiment); library(HDF5Array) sample_db <- SummarizedExperiment(HDF5Array(db_path, name="assay")) rownames(sample_db) <- HDF5Array(db_path, name="rownames") colnames(sample_db) <- HDF5Array(db_path, name="colnames")

get "vorinostatSKBtrt_cp" signature drawn from toy database

query_mat <- as.matrix(assay(sample_db[,"vorinostatSKBtrt_cp"])) query <- as.numeric(query_mat); names(query) <- rownames(query_mat) upset <- head(names(query[order(-query)]), 150) downset <- tail(names(query[order(-query)]), 150)

qsig_lincs <- qSig(query=list(upset=upset, downset=downset), gess_method="LINCS", refdb=db_path) lincs <- gess_lincs(qsig_lincs, sortby="NCS", tau=FALSE, workers=1, addAnnotations = TRUE) result(lincs) lincs <- gess_lincs(qsig_lincs, sortby="NCS", tau=FALSE, workers=1, addAnnotations = FALSE) result(lincs)

If this does not work, what version of R are you using? You may need to update to the most recent version in order for the package to function correctly.

Regards, Brendan

On Mon, May 8, 2023 at 12:41 PM Janet Joy @.***> wrote:

Hi, @brendangongol https://github.com/brendangongol, I had gotten the same issue, and adding addAnnotations=FALSE worked in the past, but now it's showing an error while using it: unused argument (addAnnotations = FALSE) How do you think I should proceed from here?

— Reply to this email directly, view it on GitHub https://github.com/girke-lab/signatureSearch/issues/12#issuecomment-1538940853, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB6W7HRBPAGUVW5S2PLB7R3XFFD5JANCNFSM6AAAAAASRHD5O4 . You are receiving this because you were mentioned.Message ID: @.***>

janjoy commented 1 year ago

Hi @brendangongol, thanks so much for your answer. I was able to run it successfully with the package update.