gagneurlab / OUTRIDER

OUTRIDER: OUTlier in RNA-seq fInDER is an R-based framework to find aberrantly expressed genes in RNA-seq data
MIT License
49 stars 11 forks source link

OURIDER do not run serially #41

Closed xuwenjian85 closed 2 years ago

xuwenjian85 commented 2 years ago

I have installed OUTRIDER package(http://bioconductor.org/packages/release/bioc/html/OUTRIDER.html) in Centos7.6 linux system. When I use the core command OUTRIDER(ods) to process my gene expression counts matrix, I noticed a strange issue. The function running in parallel even I explicitly set BPPARAM = SerialParam(). In fact it use all CPU cores always. I attached my script and data. Do you have any suggestions on making the OURIDER run serially?

############### Rscript library('OUTRIDER', quietly=TRUE) library('dplyr', quietly=TRUE) ################# load data ctsFile <- '/media/eys/xwj/RNAseq/public_normal/df_cts_HC1157_corrupt_fc2_ngene100_nrep3.txt' ctsTable <- read.table(ctsFile, check.names = FALSE) ctsTable <- ctsTable[, (ncol(ctsTable)-600+1):ncol(ctsTable)]

ods <- OutriderDataSet(countData=ctsTable) ods <- filterExpression(ods, minCounts=TRUE, filterGenes=TRUE,) ods <- estimateSizeFactors(ods)

############### input q args = commandArgs(trailingOnly=TRUE) q = as.integer(args[1]) q = 20 print(q)

start <- Sys.time() ods <- OUTRIDER(ods, q=q, BPPARAM = SerialParam(), iterations=8)

end <- Sys.time() print(end-start)

################################# > sessionInfo() R version 4.1.1 (2021-08-10) Platform: x86_64-conda-linux-gnu (64-bit) Running under: CentOS Linux 7 (Core)

Matrix products: default BLAS/LAPACK: /public/home/test1/soft/anaconda3/envs/R4.1_OUTRIDER/lib/libopenblasp-r0.3.18.so

locale: [1] LC_CTYPE=zh_CN.UTF-8 LC_NUMERIC=C
[3] LC_TIME=zh_CN.UTF-8 LC_COLLATE=zh_CN.UTF-8
[5] LC_MONETARY=zh_CN.UTF-8 LC_MESSAGES=zh_CN.UTF-8
[7] LC_PAPER=zh_CN.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=zh_CN.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] stats4 stats graphics grDevices utils datasets methods
[8] base

other attached packages: [1] dplyr_1.0.7 OUTRIDER_1.12.0
[3] data.table_1.14.2 SummarizedExperiment_1.24.0 [5] MatrixGenerics_1.6.0 matrixStats_0.61.0
[7] GenomicFeatures_1.46.1 AnnotationDbi_1.56.1
[9] Biobase_2.54.0 GenomicRanges_1.46.0
[11] GenomeInfoDb_1.30.0 IRanges_2.28.0
[13] S4Vectors_0.32.0 BiocGenerics_0.40.0
[15] BiocParallel_1.28.0

loaded via a namespace (and not attached): [1] bitops_1.0-7 bit64_4.0.5 webshot_0.5.2
[4] filelock_1.0.2 RColorBrewer_1.1-2 progress_1.2.2
[7] PRROC_1.3.1 httr_1.4.2 tools_4.1.1
[10] backports_1.3.0 utf8_1.2.2 R6_2.5.1
[13] lazyeval_0.2.2 DBI_1.1.1 colorspace_2.0-2
[16] tidyselect_1.1.1 gridExtra_2.3 prettyunits_1.1.1
[19] DESeq2_1.34.0 bit_4.0.4 curl_4.3.2
[22] compiler_4.1.1 TSP_1.1-11 xml2_1.3.2
[25] plotly_4.10.0 DelayedArray_0.20.0 rtracklayer_1.54.0
[28] scales_1.1.1 checkmate_2.0.0 genefilter_1.76.0
[31] rappdirs_0.3.3 stringr_1.4.0 digest_0.6.28
[34] Rsamtools_2.10.0 XVector_0.34.0 htmltools_0.5.2
[37] pkgconfig_2.0.3 dbplyr_2.1.1 fastmap_1.1.0
[40] htmlwidgets_1.5.4 rlang_0.4.12 RSQLite_2.2.8
[43] BBmisc_1.11 BiocIO_1.4.0 generics_0.1.1
[46] jsonlite_1.7.2 dendextend_1.15.2 RCurl_1.98-1.5
[49] magrittr_2.0.1 GenomeInfoDbData_1.2.7 Matrix_1.3-4
[52] Rcpp_1.0.7 munsell_0.5.0 fansi_0.4.2
[55] viridis_0.6.2 lifecycle_1.0.1 stringi_1.7.5
[58] yaml_2.2.1 zlibbioc_1.40.0 plyr_1.8.6
[61] BiocFileCache_2.2.0 grid_4.1.1 blob_1.2.2
[64] parallel_4.1.1 crayon_1.4.2 lattice_0.20-45
[67] Biostrings_2.62.0 splines_4.1.1 annotate_1.72.0
[70] hms_1.1.1 KEGGREST_1.34.0 locfit_1.5-9.4
[73] pillar_1.6.4 rjson_0.2.20 reshape2_1.4.4
[76] codetools_0.2-18 geneplotter_1.72.0 biomaRt_2.50.0
[79] XML_3.99-0.8 glue_1.4.2 pcaMethods_1.86.0
[82] foreach_1.5.1 png_0.1-7 vctrs_0.3.8
[85] tidyr_1.1.4 gtable_0.3.0 purrr_0.3.4
[88] heatmaply_1.3.0 assertthat_0.2.1 cachem_1.0.6
[91] ggplot2_3.3.5 xtable_1.8-4 restfulr_0.0.13
[94] survival_3.2-13 viridisLite_0.4.0 pheatmap_1.0.12
[97] seriation_1.3.1 tibble_3.1.5 iterators_1.0.13
[100] registry_0.5-1 GenomicAlignments_1.30.0 memoise_2.0.0
[103] ellipsis_0.3.2

c-mertes commented 2 years ago

Dear @xuwenjian85 thanks for posting the issue here. If no parallelization is used, data.table is internally parallelizing table operations automatically.

You can see and set how many cores are used by data.table with:

getDTthreads()
setDTthreads(threads)

Let me know if this solved your problem.

xuwenjian85 commented 2 years ago

I added the setDTthreads line. Still runs in parallel. Anyway, I find my way around this issue by use "taskset" of shell (https://linuxhint.com/use-taskset-command/):

taskset -c 1,2 myscript.R

Dear @xuwenjian85 thanks for posting the issue here. If no parallelization is used, data.table is internally parallelizing table operations automatically.

You can see and set how many cores are used by data.table with:

getDTthreads()
setDTthreads(threads)

Let me know if this solved your problem.