girke-lab / drugTargetInteractions

1 stars 0 forks source link

runDrugTarget_Annot_Bioassay throws an error #1

Closed komalsrathi closed 3 years ago

komalsrathi commented 3 years ago

Hi,

I am trying to get the following code working, this is taken from the manual: https://www.bioconductor.org/packages/devel/bioc/vignettes/drugTargetInteractions/inst/doc/drugTargetInteractions.html#7_Workflow_to_Run_Everything

The code works on your example genes i.e. CA7 and CFTR but does not work on several other genes including the example below:

library(drugTargetInteractions)
genes <- c("HOXC11", "HOXC12")
chembldb <- system.file("extdata", "chembl_sample.db", package="drugTargetInteractions")
resultsPath <- system.file("extdata", "results", package="drugTargetInteractions")
config <- genConfig(chemblDbPath=chembldb, resultsPath=resultsPath)
downloadUniChem(config=config)
cmpIdMapping(config=config)

# convert to ensembl gene id - gene symbol vector
idMap <- getSymEnsUp(EnsDb = "EnsDb.Hsapiens.v86", ids = genes, idtype = "GENE_NAME")
ens_gene_id <- idMap$ens_gene_id
queryBy <- list(molType="gene", idType = "ensembl_gene_id", ids = names(ens_gene_id))
res_list <- getParalogs(queryBy)

# runDrugTarget_Annot_Bioassay
up_col_id = "ID_up_sp"  
drug_target_list <- runDrugTarget_Annot_Bioassay(res_list = res_list, 
                                                 up_col_id = "ID_up_sp", 
                                                 ens_gene_id = ens_gene_id,
                                                 config = config)

Error:

Error in vapply(names(ensids), function(x) ens_gene_id[ensids[[x]]], character(1)) : 
  values must be length 1,
 but FUN(X[[1]]) result is length 2

Session Info:

R version 4.1.1 (2021-08-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.7

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods  
[9] base     

other attached packages:
 [1] EnsDb.Hsapiens.v86_2.99.0    ensembldb_2.16.4            
 [3] AnnotationFilter_1.16.0      GenomicFeatures_1.44.1      
 [5] AnnotationDbi_1.54.1         Biobase_2.52.0              
 [7] GenomicRanges_1.44.0         GenomeInfoDb_1.28.1         
 [9] IRanges_2.26.0               S4Vectors_0.30.0            
[11] BiocGenerics_0.38.0          drugTargetInteractions_1.0.0

loaded via a namespace (and not attached):
 [1] MatrixGenerics_1.4.2        httr_1.4.2                  UniProt.ws_2.32.0          
 [4] bit64_4.0.5                 assertthat_0.2.1            BiocFileCache_2.0.0        
 [7] blob_1.2.2                  GenomeInfoDbData_1.2.6      Rsamtools_2.8.0            
[10] yaml_2.2.1                  progress_1.2.2              pillar_1.6.2               
[13] RSQLite_2.2.8               lattice_0.20-44             glue_1.4.2                 
[16] digest_0.6.27               XVector_0.32.0              Matrix_1.3-4               
[19] XML_3.99-0.7                pkgconfig_2.0.3             biomaRt_2.48.3             
[22] zlibbioc_1.38.0             purrr_0.3.4                 BiocParallel_1.26.1        
[25] tibble_3.1.3                KEGGREST_1.32.0             generics_0.1.0             
[28] ellipsis_0.3.2              cachem_1.0.6                withr_2.4.2                
[31] SummarizedExperiment_1.22.0 lazyeval_0.2.2              magrittr_2.0.1             
[34] crayon_1.4.1                memoise_2.0.0               fansi_0.5.0                
[37] xml2_1.3.2                  tools_4.1.1                 prettyunits_1.1.1          
[40] hms_1.1.0                   BiocIO_1.2.0                lifecycle_1.0.0            
[43] matrixStats_0.60.0          stringr_1.4.0               DelayedArray_0.18.0        
[46] Biostrings_2.60.2           compiler_4.1.1              rlang_0.4.11               
[49] grid_4.1.1                  RCurl_1.98-1.4              rstudioapi_0.13            
[52] rjson_0.2.20                rappdirs_0.3.3              bitops_1.0-7               
[55] restfulr_0.0.13             DBI_1.1.1                   curl_4.3.2                 
[58] R6_2.5.1                    GenomicAlignments_1.28.0    dplyr_1.0.7                
[61] rtracklayer_1.52.1          fastmap_1.1.0               bit_4.0.4                  
[64] utf8_1.2.2                  filelock_1.0.2              ProtGenerics_1.24.0        
[67] stringi_1.7.3               Rcpp_1.0.7                  vctrs_0.3.8                
[70] png_0.1-7                   dbplyr_2.1.1                tidyselect_1.1.1

Also posted on: https://support.bioconductor.org/p/9139193/

tgirke commented 3 years ago

Based on your code sample, it seems you are using the toy database rather than a downloaded instance of the ChEMBL database (as well as UniChem ID mappings). To work with your own data it is important to set up the proper working environment by following the instructions provided under section 2.1 of the vignette here. In general when an R vignette uses system.file() then this points to a resource where your R packages are installed. In many cases you want to follow the instructions how to point to your own environment paths to work with your own data. In this case it is mainly the ChEMBL database you want to download from here. This database file is rather large >20GB. So make sure you store it in a location where you have enough space on your system.

komalsrathi commented 3 years ago

Oh, right! I'll give that a try and close both if it works. Thanks!!

komalsrathi commented 3 years ago

Downloaded the full db and still getting the error:

library(drugTargetInteractions)
genes <- c("HOXC11", "HOXC12")

# obtained from ftp://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBLdb/chembl_29_sqlite.tar.gz
chembldb <- file.path(ref_dir, "chembl", "chembl_29_sqlite", "chembl_29.db")
resultsPath <- system.file("extdata", "results", package = "drugTargetInteractions")
config <- genConfig(chemblDbPath = chembldb, resultsPath = resultsPath)
downloadUniChem(config = config)
cmpIdMapping(config = config)

idMap <- getSymEnsUp(EnsDb = "EnsDb.Hsapiens.v86", ids = genes, idtype = "GENE_NAME")
ens_gene_id <- idMap$ens_gene_id
queryBy <- list(molType="gene", idType = "ensembl_gene_id", ids = names(ens_gene_id))
res_list <- getParalogs(queryBy)

# runDrugTarget_Annot_Bioassay
drug_target_list <- runDrugTarget_Annot_Bioassay(res_list = res_list, 
                                                 up_col_id = "ID_up_sp", 
                                                 ens_gene_id = ens_gene_id,
                                                 config = config)

Error in vapply(names(ensids), function(x) ens_gene_id[ensids[[x]]], character(1)) : 
  values must be length 1,
 but FUN(X[[1]]) result is length 2

My db looks right:

du -sh $refdir/chembl/chembl_29_sqlite/chembl_29.db
 20G    $refdir/chembl/chembl_29_sqlite/chembl_29.db
tgirke commented 3 years ago

It appears there were some changes in the table structures of the latest ChEMBL29 release from July. I have committed a fix for this to the GitHub repos of the package. For those changes to be life on Bioconductor it can take ~2-3 days. To get the changes immediately on your system, you can install the updated package from GitHub directly as shown below. The additional sample code should work.

Load package and environment settings

devtools::install_github("girke-lab/drugTargetInteractions@RELEASE_3_13") # intalls release (for devel drop @...)                                                                                                                                                          
library(drugTargetInteractions)                                                                                                                                                                                                                                            
packageVersion("drugTargetInteractions") # should return 1.0.2 or 1.1.2                                                                                                                                                                                                                                                                                                                                                                                    
chembldb <- "./downloads/chembl_29_sqlite/chembl_29.db" # Assumes chembl db is in downloads dir                                                                                                                                                                            
config <- genConfig(chemblDbPath=chembldb) # default creates ./results directory where unichem mappings will be stored                                                                                                                                                     
downloadUniChem(config=config) # Downloads lookup data                                                                                                                                                                                                                     
cmpIdMapping(config=config) # Generates lookup table and stores in path defined under config                                                                                                                                                                               

Standard drug-target annotations

gene_name <- c("HOXC11", "HOXC12", "PTGS1")                                                                                                                                                                                                                                
idMap <- getSymEnsUp(EnsDb="EnsDb.Hsapiens.v86", ids=gene_name, idtype="GENE_NAME")                                                                                                                                                                                        
queryBy <- list(molType="protein", idType="UniProt_ID", ids=names(idMap$up_gene_id))                                                                                                                                                                                       
qresult <- drugTargetAnnot(queryBy, config=config)                                                                                                                                                                                                                         
qresult[1:6,1:14] # Inspect result                                                                                                                                                                                                                                         

Run drug-target annotion workflow (includes paralogs from Biomart)

gene_name <- c("HOXC11", "HOXC12", "PTGS1")                                                                                                                                                                                                                                
idMap <- getSymEnsUp(EnsDb = "EnsDb.Hsapiens.v86", ids = gene_name, idtype = "GENE_NAME")                                                                                                                                                                                  
ens_gene_id <- idMap$ens_gene_id                                                                                                                                                                                                                                           
queryBy <- list(molType="gene", idType = "ensembl_gene_id", ids = names(idMap$ens_gene_id))                                                                                                                                                                                
res_list <- getParalogs(queryBy)                                                                                                                                                                                                                                                                                                                                                                                   
drug_target_list <- runDrugTarget_Annot_Bioassay(res_list=res_list, up_col_id="ID_up_sp", ens_gene_id, config=config)                                                                                                                                                      
sapply(drug_target_list, dim) # Prints list summary                                                                                                                                                                                                                        
drug_target_list$Annotation[1:11,-18] # Inspect annotation results                                                                                                                                                                                                         
drug_target_list$Bioassay[1:11,] # Inspect bioassay results    
komalsrathi commented 3 years ago

Thank you for your quick response, I will test this tomorrow and close this if works.

komalsrathi commented 3 years ago

Works as expected, thanks!