bhklab / PharmacoGx

R package to analyze large-scale pharmacogenomic datasets.
http://pharmacodb.pmgenomics.ca
GNU General Public License v3.0
65 stars 26 forks source link

creating a pharmacoset #72

Closed lovenicole closed 3 years ago

lovenicole commented 3 years ago

hi, I'm trying to create a pharmacoset with the latest updated LINCS data, since the current version of connectivity map pharmacoset is 2016. but I keep getting an error at the last step; using the PharmacoSet function.

I followed the instructions described on the pdf file attached below. CreatingPharmacoSet.pdf

and I made all of the dataframes and expressionset. (here's my global environment) image

but I got an error message without any details. CMAP_2020 <- PharmacoSet("CMAP", molecularProfiles = phase_I_eset, cell = cell, drug = drug, curationDrug = curation_drug, curationCell = curation_cell, curationTissue = curation_tissue, datasetType = "perturbation", verify = T)

Error in PharmacoSet("CMAP", molecularProfiles = phase_I_eset, cell = cell, :

Can anyone help me?

ChristopherEeles commented 3 years ago

Hi @lovenicole,

Could you post your sessionInfo() here as well so I know what your R environment looks like. I will look into your issue as soon as I receive it.

Best, Chris

lovenicole commented 3 years ago

I really appreciate your reply. here's my session information.

R version 4.0.3 (2020-10-10)

Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux

Matrix products: default
BLAS/LAPACK: /usr/lib64/libopenblasp-r0.3.3.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] PharmacoGx_2.0.9 CoreGx_1.0.2    

loaded via a namespace (and not attached):
 [1] lsa_0.73.2                  bitops_1.0-6                matrixStats_0.57.0          RColorBrewer_1.1-2          GenomeInfoDb_1.24.2        
 [6] SnowballC_0.7.0             tools_4.0.3                 R6_2.5.0                    DT_0.16                     KernSmooth_2.23-18         
[11] sm_2.2-5.6                  BiocGenerics_0.34.0         colorspace_1.4-1            tidyselect_1.1.0            gridExtra_2.3              
[16] compiler_4.0.3              Biobase_2.48.0              shinyjs_2.0.0               DelayedArray_0.14.1         slam_0.1-47                
[21] caTools_1.18.0              scales_1.1.1                relations_0.6-9             stringr_1.4.0               digest_0.6.27              
[26] XVector_0.28.0              pkgconfig_2.0.3             htmltools_0.5.0             plotrix_3.7-8               fastmap_1.0.1              
[31] limma_3.44.3                maps_3.3.0                  htmlwidgets_1.5.2           rlang_0.4.8                 rstudioapi_0.11            
[36] shiny_1.5.0                 visNetwork_2.0.9            generics_0.0.2              jsonlite_1.7.1              BiocParallel_1.22.0        
[41] gtools_3.8.2                dplyr_1.0.2                 RCurl_1.98-1.2              magrittr_1.5                GenomeInfoDbData_1.2.3     
[46] Matrix_1.2-18               Rcpp_1.0.5                  celestial_1.4.6             munsell_0.5.0               S4Vectors_0.26.1           
[51] lifecycle_0.2.0             piano_2.4.0                 stringi_1.5.3               MASS_7.3-53                 SummarizedExperiment_1.18.2
[56] zlibbioc_1.34.0             gplots_3.1.0                plyr_1.8.6                  grid_4.0.3                  parallel_4.0.3             
[61] promises_1.1.1              shinydashboard_0.7.1        crayon_1.3.4                lattice_0.20-41             mapproj_1.2.7              
[66] pillar_1.4.6                fgsea_1.14.0                tcltk_4.0.3                 igraph_1.2.6                GenomicRanges_1.40.0       
[71] reshape2_1.4.4              marray_1.66.0               stats4_4.0.3                fastmatch_1.1-0             NISTunits_1.0.1            
[76] glue_1.4.2                  downloader_0.4              data.table_1.13.2           BiocManager_1.30.10         vctrs_0.3.4                
[81] httpuv_1.5.4                testthat_2.3.2              gtable_0.3.0                RANN_2.6.1                  purrr_0.3.4                
[86] ggplot2_3.3.2               mime_0.9                    xtable_1.8-4                pracma_2.2.9                later_1.1.0.1              
[91] tibble_3.0.4                IRanges_2.22.2              sets_1.0-18                 cluster_2.1.0               ellipsis_0.3.1             
[96] magicaxis_2.0.10

Actually, I solved this problem. Maybe it was because i needed to add more details to the molecularProfiles slot like this, `molecularProfiles = list(rna = phase_I_eset_cp_2)'.

so i'm doing the next step. but i'm faced with another problem when using drugPerturbationSig function. When I run this function like this drug.perturbation <- drugPerturbationSig(CMAP_2020_cp, mDataType = "rna", verbose = F), then the error come out, saying

Error in names(ubatch) <- paste("batch", ubatch, sep = "") : 'names' attribute [1] must be the same length as the vector [0]

Two things I'm suspicious about are the drug slot when creating pharmacoset and the batchid setting. First, the pdf file I attached before says that the drugid should be left as NA for controls and the rownames of the drug slot should be the same as the drugid. I'm confused about it because there are so many controls in my data so I can't set the controls' rownames as NA in the drug slot. Second, the batchid thing. My data is already the batch-effect-corrected data and doesn't have any batchid. So I just fill in the batchid column with NA, following the instruction of the pdf file I attached before. Is this right? or do I rather have to fill this with "perturbation" and "control" (<- it's not working, either :( )

Could you help me with this problem? Do I have to open another issue about it?

ChristopherEeles commented 3 years ago

Yes I can help you, no worries.

I think instead of NA you should set all batch ids to 1. This should resolve the error, if not please comment here.

Also, the vignette you shared seems to be out of date. Please see the most recent on Bioconductor at: https://www.bioconductor.org/packages/release/bioc/vignettes/PharmacoGx/inst/doc/CreatingPharmacoSet.pdf.

lovenicole commented 3 years ago

It doesn't work, although I set all batch ids to 1. Do you have any other solutions?

Anyway, thank you for the new vignette. I read it and it is really helpful.

lovenicole commented 3 years ago

when I set the ubatch object as Named num 1 (name == "batch1) and eliminate the code names(ubatch) <- paste("batch", ubatch, sep = "") in the rankGeneDrugPerturbation, the function which is included in the drugPerturbationSig function, then the error below does not appear again.

Error in names(ubatch) <- paste("batch", ubatch, sep = "") : 'names' attribute [1] must be the same length as the vector [0]

However, another error below keeps popping up.

Error in mcfork() : unable to fork, possible reason: Cannot allocate memory

Is there any function to reduce the size of the data and run the funtion drugPerturbationSig with it? or is there a method to separate the Pharmacoset according to the cell lines, ran the drugPerturbationSig separately and integrate them into one perturbation signature (before calculating the connectivityscore)? or can I just run the function connectivityscore with z-score, not the drug.perturbation object? because there is a z-score file (level 5 data. a signature) available in GSE92742.

ChristopherEeles commented 3 years ago

Hi @lovenicole,

From your sessionInfo() it looks like you are using an old version of PharmacoGx. As a start, please install the most recent release with BiocManager::install('PharmacoGx', version='3.12'). When this is done try rerunning the drugPerturbationSig function and see if the error persists.

If you want to subset the PSet before running drugPerturbationSig, you can do so with the subsetTo function, see ?subsetTo. Alternatively, you can specify the cells and/or drugs argument to drugPerturbationSig, which will do the subset before running. To merge the results, you will need to use the abind function. The memory used by the drugPerturbationSig function will also be proportional the the number of threads specified to nthread parameter.

I would recommend subsetting on drug instead of cell line if your memory issues persist, as the resulting subset will be much smaller. You could do something like:

results <- list()
for (drug in drugNames(CMAP)) {
    results[[drug]] <- drugPerturbationSig(CMAP, drugs=drug, nthread=2)
}

Best, Chris

lovenicole commented 3 years ago

Even though I updated the package, the same error keeps popping up.

Error in names(ubatch) <- paste("batch", ubatch, sep = "") : 'names' attribute [1] must be the same length as the vector [0]

I didn't finish the work yet, but thank you in advance! :) I really appreciate your kindness, although my English is not good.

ChristopherEeles commented 3 years ago

Hi @lovenicole,

Happy to help. We are aware that the drugPerturbationSig function is slow and have some updates planned for it in the coming months.

In the mean time, I am glad it is now working for you.

Please don't hesitate to open another issue if you have questions or concerns.

Best, Chris