Bioconductor / BiocParallel

Bioconductor facilities for parallel evaluation
https://bioconductor.org/packages/BiocParallel
65 stars 29 forks source link

BiocParallel socketConnection error #259

Closed MeatMeta closed 4 months ago

MeatMeta commented 4 months ago

Hi all,

I am trying to run some analysis on LC-MS data and am getting the below error message running the chromatogram() and refineChromPeaks() functions.

Error: BiocParallel errors
  1 remote errors, element index: 2
  0 unevaluated and other errors
  first remote error:
Error in socketConnection(port = port, server = TRUE, blocking = TRUE, : cannot open the connection.

The traceback() is as follows:

12: stop(.error_bplist(res))
11: .bpinit(manager = manager, X = X, FUN = FUN, ARGS = ARGS, BPPARAM = BPPARAM, 
        BPOPTIONS = BPOPTIONS, BPREDO = BPREDO)
10: bplapply(X = ddd, .wrapMapplyNotShared, .FUN = FUN, .MoreArgs = MoreArgs, 
        BPREDO = BPREDO, BPPARAM = BPPARAM, BPOPTIONS = BPOPTIONS)
9: bplapply(X = ddd, .wrapMapplyNotShared, .FUN = FUN, .MoreArgs = MoreArgs, 
       BPREDO = BPREDO, BPPARAM = BPPARAM, BPOPTIONS = BPOPTIONS)
8: bpmapply(subs_by_file, match(fileNames(subs), fns), FUN = function(cur_sample, 
       cur_file, rtm, mzm, aggFun) {
       sps <- spectra(cur_sample)
       rts <- rtime(cur_sample)
       cur_res <- vector("list", nrow(rtm))
       for (i in 1:nrow(rtm)) {
           in_rt <- rts >= rtm[i, 1] & rts <= rtm[i, 2]
           if (!any(in_rt)) {
               cur_res[[i]] <- MSnbase::Chromatogram(filterMz = mzm[i, 
                   ], fromFile = as.integer(cur_file), aggregationFun = aggFun)
               next
           }
           cur_sps <- lapply(sps[in_rt], function(spct, filter_mz, 
               aggFun) {
               spct <- filterMz(spct, filter_mz)
               if (!spct@peaksCount) 
                   return(c(NA_real_, NA_real_, missingValue, NA_real_))
               c(range(spct@mz, na.rm = TRUE, finite = TRUE), do.call(aggFun, 
                   list(spct@intensity, na.rm = TRUE)), spct@msLevel)
           }, filter_mz = mzm[i, ], aggFun = aggFun)
    ...
7: bpmapply(subs_by_file, match(fileNames(subs), fns), FUN = function(cur_sample, 
       cur_file, rtm, mzm, aggFun) {
       sps <- spectra(cur_sample)
       rts <- rtime(cur_sample)
       cur_res <- vector("list", nrow(rtm))
       for (i in 1:nrow(rtm)) {
           in_rt <- rts >= rtm[i, 1] & rts <= rtm[i, 2]
           if (!any(in_rt)) {
               cur_res[[i]] <- MSnbase::Chromatogram(filterMz = mzm[i, 
                   ], fromFile = as.integer(cur_file), aggregationFun = aggFun)
               next
           }
           cur_sps <- lapply(sps[in_rt], function(spct, filter_mz, 
               aggFun) {
               spct <- filterMz(spct, filter_mz)
               if (!spct@peaksCount) 
                   return(c(NA_real_, NA_real_, missingValue, NA_real_))
               c(range(spct@mz, na.rm = TRUE, finite = TRUE), do.call(aggFun, 
                   list(spct@intensity, na.rm = TRUE)), spct@msLevel)
           }, filter_mz = mzm[i, ], aggFun = aggFun)
    ...
6: withCallingHandlers(expr, warning = function(w) if (inherits(w, 
       classes)) tryInvokeRestart("muffleWarning"))
5: suppressWarnings(res <- bpmapply(subs_by_file, match(fileNames(subs), 
       fns), FUN = function(cur_sample, cur_file, rtm, mzm, aggFun) {
       sps <- spectra(cur_sample)
       rts <- rtime(cur_sample)
       cur_res <- vector("list", nrow(rtm))
       for (i in 1:nrow(rtm)) {
           in_rt <- rts >= rtm[i, 1] & rts <= rtm[i, 2]
           if (!any(in_rt)) {
               cur_res[[i]] <- MSnbase::Chromatogram(filterMz = mzm[i, 
                   ], fromFile = as.integer(cur_file), aggregationFun = aggFun)
               next
           }
           cur_sps <- lapply(sps[in_rt], function(spct, filter_mz, 
               aggFun) {
               spct <- filterMz(spct, filter_mz)
               if (!spct@peaksCount) 
                   return(c(NA_real_, NA_real_, missingValue, NA_real_))
               c(range(spct@mz, na.rm = TRUE, finite = TRUE), do.call(aggFun, 
                   list(spct@intensity, na.rm = TRUE)), spct@msLevel)
           }, filter_mz = mzm[i, ], aggFun = aggFun)
    ...
4: .extractMultipleChromatograms(object, rt = rt, mz = mz, aggregationFun = aggregationFun, 
       missingValue = missing, msLevel = msLevel, BPPARAM = BPPARAM)
3: .local(object, ...)
2: chromatogram(raw_data, aggregationFun = "max")
1: chromatogram(raw_data, aggregationFun = "max")

Any help would be greatly appreciated.

Kind regards,

mtmorgan commented 4 months ago

Please provide the output of sessionInfo(), as well as the package where chromatogram is defined. Ideally it would help to have a fully reproducible example, perhaps from a publicly accessible data set.

I am guessing that you are on a Windows machine, and that there are restrictions on the 'ports' that a user has access to. Another possibility is that too much data is being sent from the workers

One approach might be to use SerialParam() as the default. This might be provided as an argument to chromatogram() or by setting BiocParallel::register(BiocParallel::SerialParam()). Which of these works, if either, depend on how chromatogram() has been implemented.

MeatMeta commented 4 months ago

The sessionInfo() is as follows:

R version 4.3.2 (2023-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)

Matrix products: default

locale:
[1] LC_COLLATE=English_Ireland.utf8  LC_CTYPE=English_Ireland.utf8   
[3] LC_MONETARY=English_Ireland.utf8 LC_NUMERIC=C                    
[5] LC_TIME=English_Ireland.utf8    

time zone: Europe/Dublin
tzcode source: internal

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods  
[9] base     

other attached packages:
 [1] magrittr_2.0.3              ggrepel_0.9.5              
 [3] FactoMineR_2.10             factoextra_1.0.7           
 [5] heatmaply_1.5.0             viridis_0.6.5              
 [7] viridisLite_0.4.2           plotly_4.10.4              
 [9] imputeLCMD_2.1              impute_1.76.0              
[11] pcaMethods_1.94.0           norm_1.0-11.1              
[13] tmvtnorm_1.6                gmm_1.8                    
[15] sandwich_3.1-0              Matrix_1.6-1.1             
[17] mvtnorm_1.2-4               MsFeatures_1.10.0          
[19] pander_0.6.5                RColorBrewer_1.1-3         
[21] SummarizedExperiment_1.32.0 GenomicRanges_1.54.1       
[23] GenomeInfoDb_1.38.8         IRanges_2.36.0             
[25] MatrixGenerics_1.14.0       matrixStats_1.3.0          
[27] ggplot2_3.5.0               tidyr_1.3.1                
[29] dplyr_1.1.4                 xcms_4.0.2                 
[31] MSnbase_2.28.1              ProtGenerics_1.34.0        
[33] S4Vectors_0.40.2            mzR_2.36.0                 
[35] Rcpp_1.0.12                 Biobase_2.62.0             
[37] BiocGenerics_0.48.1         BiocParallel_1.36.0        

loaded via a namespace (and not attached):
 [1] rstudioapi_0.16.0           jsonlite_1.8.8             
 [3] MultiAssayExperiment_1.28.0 estimability_1.5           
 [5] MALDIquant_1.22.2           fs_1.6.3                   
 [7] zlibbioc_1.48.2             vctrs_0.6.5                
 [9] multtest_2.58.0             RCurl_1.98-1.14            
[11] webshot_0.5.5               htmltools_0.5.8.1          
[13] S4Arrays_1.2.1              progress_1.2.3             
[15] SparseArray_1.2.4           mzID_1.40.0                
[17] htmlwidgets_1.6.4           plyr_1.8.9                 
[19] emmeans_1.10.1              zoo_1.8-12                 
[21] igraph_2.0.3                lifecycle_1.0.4            
[23] iterators_1.0.14            pkgconfig_2.0.3            
[25] R6_2.5.1                    fastmap_1.1.1              
[27] GenomeInfoDbData_1.2.11     clue_0.3-65                
[29] digest_0.6.35               colorspace_2.1-0           
[31] seriation_1.5.5             Spectra_1.12.0             
[33] fansi_1.0.6                 httr_1.4.7                 
[35] abind_1.4-5                 compiler_4.3.2             
[37] withr_3.0.0                 doParallel_1.0.17          
[39] dendextend_1.17.1           MASS_7.3-60                
[41] MsExperiment_1.4.0          DelayedArray_0.28.0        
[43] scatterplot3d_0.3-44        flashClust_1.01-2          
[45] tools_4.3.2                 glue_1.7.0                 
[47] QFeatures_1.12.0            grid_4.3.2                 
[49] cluster_2.1.4               snow_0.4-4                 
[51] generics_0.1.3              gtable_0.3.4               
[53] ca_0.71.1                   preprocessCore_1.64.0      
[55] data.table_1.15.4           hms_1.1.3                  
[57] MetaboCoreUtils_1.10.0      utf8_1.2.4                 
[59] XVector_0.42.0              RANN_2.6.1                 
[61] foreach_1.5.2               pillar_1.9.0               
[63] limma_3.58.1                robustbase_0.99-2          
[65] splines_4.3.2               lattice_0.21-9             
[67] survival_3.5-7              tidyselect_1.2.1           
[69] registry_0.5-1              gridExtra_2.3              
[71] statmod_1.5.0               DT_0.33                    
[73] DEoptimR_1.1-3              lazyeval_0.2.2             
[75] codetools_0.2-19            MsCoreUtils_1.14.1         
[77] tibble_3.2.1                BiocManager_1.30.22        
[79] multcompView_0.1-10         cli_3.6.2                  
[81] affyio_1.72.0               xtable_1.8-4               
[83] munsell_0.5.1               MassSpecWavelet_1.68.0     
[85] XML_3.99-0.16.1             leaps_3.1                  
[87] assertthat_0.2.1            prettyunits_1.2.0          
[89] AnnotationFilter_1.26.0     bitops_1.0-7               
[91] scales_1.3.0                affy_1.80.0                
[93] ncdf4_1.22                  purrr_1.0.2                
[95] crayon_1.5.2                rlang_1.1.3                
[97] vsn_3.70.0                  TSP_1.2-4    

The line of code is as follows: bpis <- chromatogram(raw_data, aggregationFun = "max"). A dataset that can be followed for this can be found on https://rpubs.com/mohi/XCMS as this is what I am using to help me with my analysis.

mtmorgan commented 4 months ago

So from the reference is this a minimal reproducible example that fails for you in a new R session? I'm asking because this helps to focus my effort, instead of misunderstanding what your problem is and doing unnecessary things...

library(xcms)
library(faahKO)
cdfs <- dir(system.file("cdf", package = "faahKO"), full.names = TRUE,
        recursive = TRUE)
pd <- data.frame(sample_name = sub(basename(cdfs), pattern = ".CDF",
                   replacement = "", fixed = TRUE),
         sample_group = c(rep("KO", 6), rep("WT", 6)),
         stringsAsFactors = FALSE) 
raw_data <- readMSData(files = cdfs, pdata = new("NAnnotatedDataFrame", pd),
               mode = "onDisk")
bpis <- chromatogram(raw_data, aggregationFun = "max")
mtmorgan commented 4 months ago

Assuming that this is the workflow, I looked at

> chromatogram
standardGeneric for "chromatogram" defined from package "ProtGenerics"

function (object, ...)
standardGeneric("chromatogram")
<bytecode: 0x1077e9f20>
<environment: 0x1077e35a8>
Methods may be defined for arguments: object
Use  showMethods(chromatogram)  for currently available ones.

and then

> showMethods("chromatogram")
Function: chromatogram (package ProtGenerics)
object="MsExperiment"
object="MSnExp"
object="mzRnetCDF"
object="mzRpwiz"
object="OnDiskMSnExp"
    (inherited from: object="MSnExp")
object="XcmsExperiment"
object="XCMSnExp"

The object raw_data is

> class(raw_data)
[1] "OnDiskMSnExp"
attr(,"package")
[1] "MSnbase"

so I looked up the help page for chromatogram defined for the class MSnExp

help("chromatogram,MSnExp-method")

where I see there is an argument BPPARAM = bpparm(). From the BiocParallel vignette section [3.1.2 ][] I see that I can do either

register(SerialParam())
bpis <- chromatogram(raw_data, aggregationFun = "max")

to always use 'serial' (not parallel) execution, or

bpis <- chromatogram(raw_data, aggregationFun = "max", BPPARAM = SerialParam())

to use serial evaluation only for this function call.

I believe this addresses your short-term problem. Consider posting on the support site https://support.bioconductor.org and tagging with the packages you're using, xcms, MSnbase, etc, so that you attract the attention of the domain experts with experience using this sort of data.

MeatMeta commented 4 months ago

Hi @mtmorgan,

Thank you for your suggestions above.

I have applied the BPPARAM to the code (as below) that I was having problems with and it has resolved the issues I was having with BiocParallel.

bpis <- chromatogram(raw_data, aggregationFun = "max", BPPARAM = SerialParam())