Bioconductor / BiocParallel

Bioconductor facilities for parallel evaluation
https://bioconductor.org/packages/BiocParallel
65 stars 29 forks source link

Troubles with bplapply within function (using SnowParam on Windows) #241

Closed DavoSam closed 1 year ago

DavoSam commented 1 year ago

Hello,

I am new to BiocParallel and am trying to write a function that performs subsampling of a SingleCellExperiment object. The function below generates a list of randomly sampled indices based on the number of columns of the input object:

dummyFUN2 = function(ob,n,p,n_cores=2) {
  BP = BiocParallel::bpparam()
  BP$workers = n_cores
  bplapply(1:n, 
           function(y, ob, p, r) { len = ncol(ob); sample(x = 1:len, size = p*len, replace = r) }, 
           ob = ob, p = p, r = FALSE, BPPARAM = BP)

}

I tried to be careful about ensuring that my variables are explicitly passed to the function in bplapply() because I am working on windows. I was having difficulties with the SnowParam() default not correctly evaluating function arguments whose values are derived from global environment. I followed the recommendations at #125

To test out the code, I first ran:

prop = 0.2
num = 2
test_list = list(1,2,3,4,5,6,7,8,9,10)
test_df = data.frame(test_list) #2

dummyFUN2(ob = test_df, n = num, p = prop)

Which worked as expected and produced this output:

[[1]]
[1] 7 3

[[2]]
[1] 3 5

I then tried to use the function on a SingleCellExperiment object sce which has 20,601 columns in its counts assay:

prop = 0.2
num = 2

dummyFUN2(ob = sce, n = num, p = prop)

But I kept getting the following message:

Error Output

``` Loading required package: SingleCellExperiment Loading required package: SummarizedExperiment Loading required package: MatrixGenerics Loading required package: matrixStats Attaching package: ‘MatrixGenerics’ The following objects are masked from ‘package:matrixStats’: colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse, colCounts, colCummaxs, colCummins, colCumprods, colCumsums, colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs, colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats, colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds, colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads, colWeightedMeans, colWeightedMedians, colWeightedSds, colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet, rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods, rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps, rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins, rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks, rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars, rowWeightedMads, rowWeightedMeans, rowWeightedMedians, rowWeightedSds, rowWeightedVars Loading required package: GenomicRanges Loading required package: stats4 Loading required package: BiocGenerics Attaching package: ‘BiocGenerics’ The following objects are masked from ‘package:stats’: IQR, mad, sd, var, xtabs The following objects are masked from ‘package:base’: anyDuplicated, append, as.data.frame, basename, cbind, colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank, rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply, union, unique, unsplit, which.max, which.min Loading required package: S4Vectors Attaching package: ‘S4Vectors’ The following objects are masked from ‘package:base’: expand.grid, I, unname Loading required package: IRanges Attaching package: ‘IRanges’ The following object is masked from ‘package:grDevices’: windows Loading required package: GenomeInfoDb Loading required package: Biobase Welcome to Bioconductor Vignettes contain introductory material; view with 'browseVignettes()'. To cite Bioconductor, see 'citation("Biobase")', and for packages 'citation("pkgname")'. Attaching package: ‘Biobase’ The following object is masked from ‘package:MatrixGenerics’: rowMedians The following objects are masked from ‘package:matrixStats’: anyMissing, rowMedians Error: BiocParallel errors 2 remote errors, element index: 1, 2 0 unevaluated and other errors first remote error: argument of length 0 In addition: Warning messages: 1: In serialize(data, node$con) : 'package:stats' may not be available when loading 2: In serialize(data, node$con) : Error: BiocParallel errors 2 remote errors, element index: 1, 2 0 unevaluated and other errors first remote error: argument of length 0 ```

I recognized that the error message is likely due to the ncol function not properly recognizing the sce object, as it seemed to work for a simple dataframe. This was confirmed when I put a print(len) statement in the dummy function and got NULL only when using the sce object. Normally, ncol works just fine for sce objects. Only when I specified BiocGenerics::ncol(ob) in the bplapply function did the code work for the sce object (while also still working for the test_df):

dummyFUN2 = function(ob,n,p,n_cores=2) {
  BP = BiocParallel::bpparam()
  BP$workers = n_cores
  bplapply(1:n, 
           function(y, ob, p, r) { len = BiocGenerics::ncol(ob); sample(1:len, p*len, replace = r) },  #works
           ob = ob, p = p, r = FALSE, BPPARAM = BP)

}
Example Successful Output

``` [[1]] [1] 10927 19472 19002 10003 6230 10531 15556 5327 4794 13466 8347 7295 8111 16727 6809 5382 11821 11019 6219 16712 5517 18557 16583 4691 [25] 11340 15356 11227 5621 16624 4521 8073 12204 1975 8606 8163 2316 20053 12814 9014 11450 18756 4929 5153 19642 1062 3947 858 14623 [49] 11497 15104 11729 522 7635 4262 15330 7728 4494 1966 10051 5702 10753 1256 4630 6760 6664 2282 18123 18200 13556 19751 13419 14253 [73] 6484 18381 11961 11861 9373 8835 8593 5510 13906 10236 19344 4640 8766 19567 15287 9999 18624 6784 3350 796 16201 12366 13962 16308 [97] 15021 16367 20166 14849 6256 14826 3248 11684 5916 15456 9051 7830 13805 14501 5745 18654 7615 15320 13240 8540 15595 7380 15367 671 [121] 20546 9302 545 12940 15845 13448 591 1910 12997 11254 17622 19294 13486 10497 8918 4689 7774 2935 20439 14884 1483 10223 4143 12125 [145] 3065 9755 18768 15225 16756 18903 9661 19772 2917 18523 9412 5434 5364 14047 5534 10308 14691 16650 20188 7404 9821 17481 17349 13125 [169] 5483 2477 14037 11223 11121 19563 9533 8802 7339 11293 16217 1787 11238 3084 15375 3746 5211 7014 5034 17892 11782 3972 4515 64 [193] 2680 18016 16635 18899 10050 11204 736 5779 10982 15976 12040 2437 4345 1192 9685 9703 16490 1577 3235 13257 15748 9485 14450 9408 [217] 16737 6878 2319 1217 15348 5686 7166 63 6341 14880 3819 13371 11199 11659 1229 8693 10835 350 14046 19955 18634 4240 18178 5481 [241] 1663 3554 5381 8339 6362 13435 8761 7434 15930 11600 9033 3177 17454 7072 8474 2374 16711 10322 19277 2380 18152 11007 2822 20347 [265] 17051 18599 17208 6947 3953 16728 4 16747 5799 773 5017 15641 7878 17375 18020 6581 4164 7835 698 352 20443 2020 3227 7584 [289] 1395 8399 14524 14053 6487 19907 18528 19284 5232 2634 19261 20114 1105 11804 15893 6744 7205 8618 16519 8565 11379 15602 13641 6854 [313] 3913 3380 1597 8631 10076 7510 9201 11912 12705 14031 6186 16124 19929 6941 16793 7127 2065 13495 10803 16030 6830 2191 4715 19719 [337] 18907 16236 10300 7349 2113 2750 3061 5710 1078 10886 9896 6306 3968 3832 13097 16909 11881 5416 13621 17919 8442 14506 20405 17048 [361] 15606 16495 15959 5133 9393 8641 4450 17246 19173 10730 12850 2791 10690 7194 1126 16210 10975 589 1286 18693 19919 15242 283 3417 [385] 13111 9727 9022 17659 353 16818 3974 18700 11852 12547 11762 17032 5297 13101 10112 19865 6719 1315 3956 18657 9898 8530 4426 5415 [409] 16229 2080 15824 2068 10539 7466 7506 13509 13084 17363 2630 12356 1831 14127 7152 9694 15511 6054 318 13259 9893 11794 15412 6063 [433] 16606 10779 5582 14011 7088 1330 3482 18347 8826 18872 13698 3208 16710 15505 2305 19323 11809 15636 2914 11159 15133 12225 2321 19333 [457] 10126 9942 5741 315 8972 9936 19167 4406 516 18227 18105 15902 13587 9972 11708 14584 10854 5811 20478 3183 5994 5231 13327 7646 [481] 17388 17740 14813 962 7889 18221 8173 7399 18112 3708 10815 4303 3246 14247 15160 19471 11443 17464 17040 18115 18198 17562 5930 7606 [505] 3699 563 19072 6253 6420 11346 4863 17423 18225 1153 1658 18336 5402 18520 17678 7129 5714 16591 9957 20461 19892 9563 12631 11047 [529] 10987 2781 13501 8395 2312 12193 10011 8531 13935 800 5376 12517 17715 16479 144 850 18920 3667 16412 17515 10847 11752 5400 8775 [553] 15875 6625 2620 12402 12674 798 2143 12538 4459 5767 3056 4377 272 19797 11686 19066 19827 6720 15388 15023 16794 274 16631 156 [577] 1425 8392 18177 11590 6641 15049 18789 14285 11087 405 11663 19456 7824 9410 9820 14530 12528 14888 17600 8483 7911 3153 14600 5812 [601] 6574 16329 16157 10064 15516 9515 15756 16757 6635 1128 8175 3824 332 19729 8097 5070 7839 2148 15403 17103 17976 4973 18537 11351 [625] 3411 8521 12000 14236 6350 20101 4034 3217 4100 1316 7445 4572 9066 17567 2403 20409 2430 6043 726 1520 20557 9890 1947 12526 [649] 3509 14893 15652 14961 7987 18290 13192 19245 10013 16962 11739 4325 11407 17632 5504 1771 16982 1188 13550 4473 5101 10403 17889 9500 [673] 12884 4491 4961 6032 19220 3830 14030 9284 14479 7881 9831 1502 438 8703 18179 8038 17998 14535 9827 6800 5467 12165 15772 16260 [697] 7949 15965 18508 7174 14111 17920 2821 15655 5616 14558 1991 5023 20002 10979 15050 15866 2162 17091 7777 4398 772 6000 18633 10554 [721] 14258 15283 20360 17405 16042 11066 319 8568 8189 14368 20068 4495 4683 11348 18592 17163 7304 14150 4278 4644 18252 15847 20278 8215 [745] 10735 10104 14096 9207 13216 11341 6059 18037 7817 7283 4176 20363 14054 19894 12374 6478 11972 14685 18434 12937 20257 19032 4634 19559 [769] 6332 17522 18741 14233 11836 7203 13356 6905 15991 13329 3021 18720 6127 19923 6285 11845 19872 3583 3020 16040 19036 4909 11012 1707 [793] 5363 19092 4011 6602 17757 8131 12423 4018 12383 8890 12123 12242 7078 4288 506 1759 12700 6638 15676 11127 619 14569 6838 11263 [817] 4092 10600 13886 13137 18100 8424 2837 6678 4341 2070 5235 18792 11591 11756 8937 5674 19604 941 4415 15678 5715 829 13795 13158 [841] 11185 5254 14523 10193 20576 9176 17380 13326 12906 5678 1162 16826 1357 7953 16084 12511 19763 11184 2800 8394 14741 16063 3979 8833 [865] 16343 17760 3351 11575 7872 10650 16740 5394 9068 13504 1047 7493 19775 18008 20045 682 12217 10415 19165 8403 11515 3526 16275 1761 [889] 18517 12717 14001 9493 15447 15561 4826 10946 19618 628 17001 19373 5228 14275 7890 3633 16300 18369 19696 17944 6178 20141 5250 10880 [913] 7655 13696 7734 18088 9402 18939 9866 11870 18732 12490 16645 18210 16325 6610 11188 14588 18729 7015 9544 715 377 1630 16079 8906 [937] 11472 17953 17588 71 210 5163 4541 12334 263 16681 13472 8843 3545 19143 5177 6505 1100 396 13184 18841 16273 16336 15816 15389 [961] 9935 20104 17583 11244 17265 5956 745 9258 14959 17268 3259 19319 14186 8828 1175 7424 9129 19757 15982 15852 19620 12572 338 1730 [985] 15095 17365 4653 17309 20601 15070 803 8440 635 12736 15525 5303 678 16396 10177 20407 [ reached getOption("max.print") -- omitted 3120 entries ] [[2]] [1] 8805 13721 10997 6098 7661 5134 11305 3 11896 7551 9015 8688 5864 2656 16967 1199 16436 11637 12042 9561 16090 3146 10307 15959 [25] 20227 12861 13620 12542 19839 18964 16658 6859 463 1926 3021 13364 3696 4463 17651 14739 9138 19374 9610 12518 1389 3599 14554 9742 [49] 20119 8317 18581 5490 14917 18209 5766 12528 3122 3775 13929 16088 57 459 6921 18471 11063 5224 7379 16025 1490 10453 14064 20163 [73] 3428 15776 16988 16787 10814 2953 13043 17778 4022 19404 18302 16030 7366 4843 12778 8683 16709 16651 10940 4850 7874 1961 10777 13141 [97] 1505 2816 11097 16307 7954 425 3699 2969 6654 18321 5079 662 8056 11347 7168 14083 17479 8734 2599 10991 11594 2474 9248 17886 [121] 16268 19263 19829 17968 18767 17349 11824 20417 1878 10277 3548 4593 17736 17046 9010 406 6758 14380 14025 8437 2379 9710 14653 1974 [145] 16531 4176 13322 3881 6298 16421 9521 9411 4288 12010 1120 4927 5555 13289 1386 13514 7026 13761 12504 269 14648 2317 8750 318 [169] 14142 4153 8163 20093 1068 12374 10560 18850 1142 2015 6192 9592 13964 1200 9858 18553 12168 2890 19693 15881 1794 18451 6705 1793 [193] 19810 16665 9759 8514 16381 626 10571 12212 14150 19211 80 6214 8666 3806 18389 19891 2755 19292 4627 5237 13532 16385 4899 5755 [217] 9408 11892 2962 500 2531 5605 13418 16229 3758 3292 12256 13248 10673 19932 10965 10268 625 11665 15142 159 6403 795 17838 8492 [241] 13417 6383 7489 18514 12318 19512 18146 12181 18470 19321 3164 8568 16341 2400 108 12122 12785 14682 2278 12572 2911 16159 4967 14907 [265] 9369 16583 19649 10210 6141 5995 15059 3635 1710 2055 12702 15636 1590 351 4777 10574 5066 7890 12924 17038 17780 5182 11986 16136 [289] 7498 4598 6507 15538 12909 13165 6394 9030 13371 5746 18444 8781 2294 13950 6057 4387 11508 14326 18606 18010 5508 13777 2575 10929 [313] 1331 3112 12247 2997 15694 6746 15394 18885 10806 3151 2486 8911 3374 9994 12817 7443 5086 1835 10588 11320 12358 10720 17814 7455 [337] 11296 16827 10117 8520 3458 19844 1577 19034 11532 2627 19067 10737 9337 19715 7761 2783 11161 17304 3685 13234 7953 5636 10109 7667 [361] 13489 9153 5208 5087 350 18342 14269 12141 15239 7966 3288 16278 7730 18992 3225 11114 17765 14706 19014 3620 3231 11596 18537 12303 [385] 12803 13903 4502 12298 15705 504 1734 1225 892 11088 20147 893 19998 11175 10427 5240 2336 2949 18733 9358 14291 16934 18064 4693 [409] 12608 19612 15361 10097 13872 7508 8593 18601 822 12170 1820 12746 19503 9222 7080 15494 10149 14925 787 4004 15335 2076 10517 3616 [433] 19925 12758 19957 11497 257 12912 18554 9755 16295 14797 20029 16368 8662 5982 4373 8444 9231 16353 15637 18641 8079 6496 6598 19197 [457] 2051 9812 4806 12242 2859 16461 1821 1359 15356 14443 8344 19315 14480 18803 7202 20288 7338 7581 1635 3800 3020 11592 7885 8895 [481] 16409 3592 18611 17303 7602 5022 14578 13813 2590 16173 9554 4102 19389 5443 19633 3145 2179 200 6469 17138 8410 15772 745 7405 [505] 18118 514 5372 19222 9183 9865 10870 5751 15031 8247 8161 6640 6734 1625 9589 10869 12087 8471 16680 6386 8206 10540 2935 14318 [529] 16002 10774 5670 20141 10302 19703 12590 20291 18041 14009 16735 7764 12537 9872 3817 14256 11330 2712 11308 16400 3506 9388 10831 20277 [553] 14784 6874 11900 13858 3571 3982 17180 2616 11994 9718 2000 12255 3115 15201 3841 7377 7914 9754 13888 4372 6551 18439 6152 14197 [577] 17860 6877 17879 5597 15459 11068 16450 17578 9215 7477 11455 5446 13986 15868 13959 19870 13961 2762 1510 11028 18068 13869 9203 20488 [601] 1951 8992 12521 14967 6854 951 1473 15346 18910 9891 4820 1614 17837 2982 9724 5472 14018 20428 6099 2769 16146 17095 19198 5775 [625] 8107 18436 17113 3942 1226 15701 9623 479 13839 6007 18639 5335 18125 16127 19590 11353 176 14619 5546 2300 2979 9064 8615 15971 [649] 4787 18287 1780 17966 7011 7456 12712 1958 3205 11370 20585 19190 10482 15071 4894 10759 16811 5156 5886 17510 9669 19181 14750 4532 [673] 20517 17835 2061 16751 18876 19753 11132 12954 49 5783 4972 16646 12147 4764 10383 4216 1344 18249 4164 7231 14613 3979 7905 9340 [697] 18671 3053 4586 6520 2803 4035 5734 20303 13772 685 8404 7206 6222 13003 5801 18326 17913 1441 9966 15536 19061 18025 13582 9867 [721] 2027 3371 6723 19326 1905 16202 12396 8429 342 8627 3904 10692 6455 2047 9497 17514 17993 17990 8803 8640 5417 4053 333 16116 [745] 5693 8546 13746 13800 1650 18095 15563 15819 14146 17660 76 3823 12973 9578 2328 14799 708 3788 563 17947 995 9708 13468 19262 [769] 7223 9175 14001 3778 11243 19741 7458 20047 5051 7554 362 2012 17820 11663 2404 4238 6416 19148 7778 5331 15970 4788 4357 10919 [793] 9664 13 11019 5765 11687 12700 9448 542 14100 1906 13496 210 20594 18911 13022 8357 20475 1347 2676 20587 18210 8453 11890 13519 [817] 2730 16096 7800 15321 15742 10996 7093 71 13354 12523 15349 12398 16394 17126 17157 8822 5261 14099 3072 14346 7993 2569 11946 162 [841] 8684 2477 15266 16725 7965 13080 8570 14533 18170 10429 12153 10993 6812 12928 3125 10437 6670 19929 11789 5302 12043 14780 2588 11434 [865] 10466 17096 20545 9384 15581 2478 9339 3951 8941 4932 2268 18985 12808 6706 6747 12327 1336 13046 4553 8089 14527 13540 2087 18134 [889] 13943 19019 4049 7615 14026 19690 16234 14665 12054 2591 19488 19269 1862 2191 16027 3196 18046 9515 19564 20075 20178 1910 15555 17487 [913] 16694 13895 16348 13169 17935 10387 380 9943 18028 10388 11672 13752 16520 19597 1362 18113 3649 7692 1003 1687 2765 455 6808 6720 [937] 13833 14781 347 19227 20069 1660 16706 12913 11331 2915 16076 9237 20459 19308 14389 14526 15385 2653 20165 193 6622 15530 9244 3792 [961] 8295 8158 16631 17646 6990 18857 17839 9743 6283 18694 17784 6992 1450 14912 10044 8542 9296 10395 5652 19745 2064 8961 13343 6620 [985] 12173 6790 6276 14245 790 7708 18703 12171 890 4833 15733 3078 1449 18165 4441 6093 [ reached getOption("max.print") -- omitted 3120 entries ] Warning messages: 1: In serialize(data, node$con) : 'package:stats' may not be available when loading 2: In serialize(data, node$con) : 'package:stats' may not be available when loading ```

I am trying to figure out what the issue is with using standard ncol. base::ncol(sce) works when running it in the terminal, so I'm not sure why it won't work within bplapply. My guess is that it has something to do with how the environment is made for workers when using SnowParam? Just looking for advice on best practice when writing bplapply code in functions.

Thank you for your time and for providing this great package, I hope we can work together to resolve these issues!

Session Information

``` R version 4.1.3 (2022-03-10) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19044) Matrix products: default locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C LC_TIME=English_United States.1252 attached base packages: [1] stats4 grid stats graphics grDevices utils datasets methods base other attached packages: [1] BiocParallel_1.28.3 tradeSeq_1.8.0 slingshot_2.2.1 TrajectoryUtils_1.2.0 SingleCellExperiment_1.16.0 [6] SummarizedExperiment_1.24.0 Biobase_2.54.0 GenomicRanges_1.46.1 GenomeInfoDb_1.30.1 IRanges_2.28.0 [11] S4Vectors_0.32.3 BiocGenerics_0.40.0 MatrixGenerics_1.6.0 matrixStats_0.61.0 princurve_2.1.6 [16] harmony_0.1.0 Rcpp_1.0.8.3 SoupX_1.5.2 glmGamPoi_1.6.0 sctransform_0.3.3 [21] fs_1.5.2 stringr_1.4.0 openxlsx_4.2.5 biomaRt_2.50.3 Matrix_1.4-0 [26] gridExtra_2.3 magrittr_2.0.2 rlang_1.0.2 cowplot_1.1.1 patchwork_1.1.1 [31] metap_1.8 ComplexHeatmap_2.10.0 ggrepel_0.9.1 ggpubr_0.4.0 pheatmap_1.0.12 [36] RColorBrewer_1.1-2 SeuratObject_4.0.4 Seurat_4.1.0 varhandle_2.0.5 viridis_0.6.2 [41] viridisLite_0.4.0 scales_1.1.1 ggplot2_3.3.5 dplyr_1.0.8 plyr_1.8.7 [46] reshape2_1.4.4 loaded via a namespace (and not attached): [1] rappdirs_0.3.3 scattermore_0.8 tidyr_1.2.0 bit64_4.0.5 irlba_2.3.5 multcomp_1.4-18 [7] DelayedArray_0.20.0 data.table_1.14.2 rpart_4.1.16 KEGGREST_1.34.0 RCurl_1.98-1.6 doParallel_1.0.17 [13] generics_0.1.2 snow_0.4-4 callr_3.7.0 TH.data_1.1-0 usethis_2.1.5 RSQLite_2.2.11 [19] RANN_2.6.1 future_1.24.0 bit_4.0.4 mutoss_0.1-12 spatstat.data_3.0-0 xml2_1.3.3 [25] httpuv_1.6.5 assertthat_0.2.1 hms_1.1.1 promises_1.2.0.1 fansi_1.0.3 progress_1.2.2 [31] dbplyr_2.1.1 igraph_1.2.11 DBI_1.1.2 tmvnsim_1.0-2 htmlwidgets_1.5.4 spatstat.geom_3.0-6 [37] purrr_0.3.4 ellipsis_0.3.2 backports_1.4.1 deldir_1.0-6 vctrs_0.3.8 remotes_2.4.2 [43] ROCR_1.0-11 abind_1.4-5 cachem_1.0.6 withr_2.5.0 prettyunits_1.1.1 goftest_1.2-3 [49] mnormt_2.0.2 cluster_2.1.2 segmented_1.6-2 lazyeval_0.2.2 crayon_1.5.1 edgeR_3.36.0 [55] pkgconfig_2.0.3 qqconf_1.2.2 nlme_3.1-155 pkgload_1.2.4 devtools_2.4.3 globals_0.14.0 [61] lifecycle_1.0.1 miniUI_0.1.1.1 sandwich_3.0-1 filelock_1.0.2 BiocFileCache_2.2.1 mathjaxr_1.6-0 [67] SeuratData_0.2.1 rprojroot_2.0.2 polyclip_1.10-0 lmtest_0.9-40 carData_3.0-5 zoo_1.8-9 [73] ggridges_0.5.3 GlobalOptions_0.1.2 processx_3.5.3 png_0.1-7 rjson_0.2.21 bitops_1.0-7 [79] KernSmooth_2.23-20 Biostrings_2.62.0 blob_1.2.2 shape_1.4.6 parallelly_1.30.0 spatstat.random_3.1-3 [85] rstatix_0.7.0 ggsignif_0.6.3 memoise_2.0.1 ica_1.0-2 zlibbioc_1.40.0 compiler_4.1.3 [91] plotrix_3.8-2 clue_0.3-60 fitdistrplus_1.1-8 cli_3.2.0 XVector_0.34.0 listenv_0.8.0 [97] pbapply_1.5-0 ps_1.6.0 MASS_7.3-55 mgcv_1.8-39 tidyselect_1.1.2 stringi_1.7.6 [103] locfit_1.5-9.5 tools_4.1.3 future.apply_1.8.1 parallel_4.1.3 circlize_0.4.15 rstudioapi_0.13 [109] PseudotimeDE_1.0.0 foreach_1.5.2 Rtsne_0.15 digest_0.6.29 shiny_1.7.1 qgam_1.3.4 [115] car_3.0-12 broom_0.7.12 later_1.3.0 RcppAnnoy_0.0.19 httr_1.4.2 AnnotationDbi_1.56.2 [121] kernlab_0.9-32 Rdpack_2.3 colorspace_2.0-3 brio_1.1.3 XML_3.99-0.9 tensor_1.5 [127] reticulate_1.24 splines_4.1.3 uwot_0.1.11 sn_2.0.2 spatstat.utils_3.0-1 multtest_2.50.0 [133] plotly_4.10.0 sessioninfo_1.2.2 xtable_1.8-4 jsonlite_1.8.0 testthat_3.1.2 R6_2.5.1 [139] TFisher_0.2.0 pillar_1.7.0 htmltools_0.5.2 mime_0.12 glue_1.6.2 fastmap_1.1.0 [145] codetools_0.2-18 pkgbuild_1.3.1 mvtnorm_1.1-3 utf8_1.2.2 lattice_0.20-45 spatstat.sparse_3.0-0 [151] tibble_3.1.6 mixtools_2.0.0 numDeriv_2016.8-1.1 curl_4.3.2 leiden_0.3.9 zip_2.2.0 [157] limma_3.50.1 survival_3.2-13 desc_1.4.1 munsell_0.5.0 GetoptLong_1.0.5 GenomeInfoDbData_1.2.7 [163] iterators_1.0.14 gtable_0.3.0 rbibutils_2.2.7 spatstat.core_2.4-0 ```

Jiefei-Wang commented 1 year ago

Hi @DavoSam ,

FYI: The current version of BiocParallel does support exporting objects automatically. It is not a panacea and has certain limitations, but it should work in your example.

I think the issue here is that the worker does not attach the package SingleCellExperiment to the search path. The words "attach" and "load" have different meanings in R. In your example, you load the package SingleCellExperiment without attaching it to the R search path. By doing that, the functions in SingleCellExperiment are available to you, but not to R generic functions.

One solution is to explicitly attach the package

 bplapply(
           1:n, 
           function(y) { 
               library(SingleCellExperiment)
               len = ncol(ob)
               sample(1:len, p*len, replace = r) 
           }, 
           BPPARAM = BP
           )

It should find the function BiocGenerics::ncol with no issue

Best, Jiefei

mtmorgan commented 1 year ago

Just to confirm that, for a reproducilbe example, after doing

library(SingleCellExperiment)
example(SingleCellExperiment)

The following fails (because SingleCellExperiment is loaded in the R session of the worker but not attached to the search path)

 > bplapply(list(1:2, 1:3), function(i, sce) sce[,i], sce, BPPARAM = SnowParam(2))
...
Error: BiocParallel errors
  2 remote errors, element index: 1, 2
  0 unevaluated and other errors
  first remote error:
Error in sce[, i]: object of type 'S4' is not subsettable

but the following succeeds

res <- bplapply(list(1:2, 1:3), function(i, sce) {
    suppressPackageStartupMessages({ library(SingleCellExperiment) })
    sce[,i]
}, sce, BPPARAM = SnowParam(2))
DavoSam commented 1 year ago

@mtmorgan @Jiefei-Wang

Thank you for the prompt reply, I tested the reproducible example on my local Windows machine and it worked once I attached the package using library(). I didn't recognize the difference between load and attach, thanks for the explanation!

Out of curiosity, I ran the reproducible example and some additional tests on my school's Hoffman2 cluster and noticed some differences in behavior compared to the tests on local. Of course, I recognize the setup is markedly different (the cluster OS is linux, using jupyterLab, R 4.2.2, more up to date packages). All test output is from the remote 'new test' setup except for the last one for Test 4.

New test setup

``` R version 4.2.2 (2022-10-31) Platform: x86_64-conda-linux-gnu (64-bit) Running under: CentOS Linux 7 (Core) Matrix products: default BLAS/LAPACK: /u/home/d/davidsam/miniconda3/envs/r_seurat/lib/libopenblasp-r0.3.21.so locale: [1] C attached base packages: [1] stats4 stats graphics grDevices utils datasets methods [8] base other attached packages: [1] SingleCellExperiment_1.20.0 SummarizedExperiment_1.28.0 [3] Biobase_2.58.0 GenomicRanges_1.50.0 [5] GenomeInfoDb_1.34.1 IRanges_2.32.0 [7] S4Vectors_0.36.0 BiocGenerics_0.44.0 [9] MatrixGenerics_1.10.0 matrixStats_0.63.0 [11] BiocParallel_1.32.5 loaded via a namespace (and not attached): [1] pillar_1.8.1 compiler_4.2.2 XVector_0.38.0 [4] base64enc_0.1-3 bitops_1.0-7 tools_4.2.2 [7] zlibbioc_1.44.0 digest_0.6.31 uuid_1.1-0 [10] lattice_0.20-45 jsonlite_1.8.4 evaluate_0.20 [13] lifecycle_1.0.3 rlang_1.0.6 Matrix_1.5-3 [16] DelayedArray_0.24.0 IRdisplay_1.1 cli_3.6.0 [19] IRkernel_1.3.2 parallel_4.2.2 fastmap_1.1.0 [22] GenomeInfoDbData_1.2.9 repr_1.1.6 vctrs_0.5.2 [25] grid_4.2.2 glue_1.6.2 snow_0.4-4 [28] fansi_1.0.4 pbdZMQ_0.3-9 codetools_0.2-18 [31] htmltools_0.5.4 utf8_1.2.2 RCurl_1.98-1.9 [34] crayon_1.5.2 ```

Local test setup

``` R version 4.1.3 (2022-03-10) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19044) Matrix products: default locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C LC_TIME=English_United States.1252 attached base packages: [1] stats4 stats graphics grDevices utils datasets methods base other attached packages: [1] BiocParallel_1.28.3 SingleCellExperiment_1.16.0 SummarizedExperiment_1.24.0 Biobase_2.54.0 GenomicRanges_1.46.1 [6] GenomeInfoDb_1.30.1 IRanges_2.28.0 S4Vectors_0.32.3 BiocGenerics_0.40.0 MatrixGenerics_1.6.0 [11] matrixStats_0.61.0 loaded via a namespace (and not attached): [1] SeuratObject_4.0.4 Rcpp_1.0.8.3 lattice_0.20-45 tidyr_1.2.0 snow_0.4-4 prettyunits_1.1.1 [7] ps_1.6.0 assertthat_0.2.1 rprojroot_2.0.2 utf8_1.2.2 R6_2.5.1 SeuratData_0.2.1 [13] pillar_1.7.0 zlibbioc_1.40.0 rlang_1.0.2 callr_3.7.0 Matrix_1.4-0 desc_1.4.1 [19] devtools_2.4.3 RCurl_1.98-1.6 DelayedArray_0.20.0 compiler_4.1.3 pkgconfig_2.0.3 pkgbuild_1.3.1 [25] tidyselect_1.1.2 tibble_3.1.6 GenomeInfoDbData_1.2.7 fansi_1.0.3 crayon_1.5.1 dplyr_1.0.8 [31] withr_2.5.0 bitops_1.0-7 brio_1.1.3 rappdirs_0.3.3 grid_4.1.3 gtable_0.3.0 [37] lifecycle_1.0.1 DBI_1.1.2 magrittr_2.0.2 cli_3.2.0 cachem_1.0.6 XVector_0.34.0 [43] fs_1.5.2 remotes_2.4.2 testthat_3.1.2 ellipsis_0.3.2 generics_0.1.2 vctrs_0.3.8 [49] tools_4.1.3 glue_1.6.2 purrr_0.3.4 processx_3.5.3 pkgload_1.2.4 parallel_4.1.3 [55] fastmap_1.1.0 sessioninfo_1.2.2 memoise_2.0.1 usethis_2.1.5 ```

Test 1:

#reproducible example
#result: FAIL (expected), same as on local setup
library(BiocParallel)
library(SingleCellExperiment)
example(SingleCellExperiment)
bplapply(list(1:2, 1:3), function(i, sce) sce[,i], sce, BPPARAM = SnowParam(2))
Test 1 Output

``` Loading required package: SummarizedExperiment Loading required package: MatrixGenerics Loading required package: matrixStats Attaching package: 'MatrixGenerics' The following objects are masked from 'package:matrixStats': colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse, colCounts, colCummaxs, colCummins, colCumprods, colCumsums, colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs, colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats, colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds, colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads, colWeightedMeans, colWeightedMedians, colWeightedSds, colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet, rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods, rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps, rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins, rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks, rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars, rowWeightedMads, rowWeightedMeans, rowWeightedMedians, rowWeightedSds, rowWeightedVars Loading required package: GenomicRanges Loading required package: stats4 Loading required package: BiocGenerics Attaching package: 'BiocGenerics' The following objects are masked from 'package:stats': IQR, mad, sd, var, xtabs The following objects are masked from 'package:base': Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append, as.data.frame, basename, cbind, colnames, dirname, do.call, duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted, lapply, mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank, rbind, rownames, sapply, setdiff, sort, table, tapply, union, unique, unsplit, which.max, which.min Loading required package: S4Vectors Attaching package: 'S4Vectors' The following objects are masked from 'package:base': I, expand.grid, unname Loading required package: IRanges Loading required package: GenomeInfoDb Loading required package: Biobase Welcome to Bioconductor Vignettes contain introductory material; view with 'browseVignettes()'. To cite Bioconductor, see 'citation("Biobase")', and for packages 'citation("pkgname")'. Attaching package: 'Biobase' The following object is masked from 'package:MatrixGenerics': rowMedians The following objects are masked from 'package:matrixStats': anyMissing, rowMedians SnglCE> ncells <- 100 SnglCE> u <- matrix(rpois(20000, 5), ncol=ncells) SnglCE> v <- log2(u + 1) SnglCE> pca <- matrix(runif(ncells*5), ncells) SnglCE> tsne <- matrix(rnorm(ncells*2), ncells) SnglCE> sce <- SingleCellExperiment(assays=list(counts=u, logcounts=v), SnglCE+ reducedDims=SimpleList(PCA=pca, tSNE=tsne)) SnglCE> sce class: SingleCellExperiment dim: 200 100 metadata(0): assays(2): counts logcounts rownames: NULL rowData names(0): colnames: NULL colData names(0): reducedDimNames(2): PCA tSNE mainExpName: NULL altExpNames(0): SnglCE> ## coercion from SummarizedExperiment SnglCE> se <- SummarizedExperiment(assays=list(counts=u, logcounts=v)) SnglCE> as(se, "SingleCellExperiment") class: SingleCellExperiment dim: 200 100 metadata(0): assays(2): counts logcounts rownames: NULL rowData names(0): colnames: NULL colData names(0): reducedDimNames(0): mainExpName: NULL altExpNames(0): Loading required package: SingleCellExperiment Loading required package: SummarizedExperiment Loading required package: MatrixGenerics Loading required package: matrixStats Attaching package: 'MatrixGenerics' The following objects are masked from 'package:matrixStats': colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse, colCounts, colCummaxs, colCummins, colCumprods, colCumsums, colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs, colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats, colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds, colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads, colWeightedMeans, colWeightedMedians, colWeightedSds, colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet, rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods, rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps, rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins, rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks, rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars, rowWeightedMads, rowWeightedMeans, rowWeightedMedians, rowWeightedSds, rowWeightedVars Loading required package: GenomicRanges Loading required package: stats4 Loading required package: BiocGenerics Attaching package: 'BiocGenerics' The following objects are masked from 'package:stats': IQR, mad, sd, var, xtabs The following objects are masked from 'package:base': Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append, as.data.frame, basename, cbind, colnames, dirname, do.call, duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted, lapply, mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank, rbind, rownames, sapply, setdiff, sort, table, tapply, union, unique, unsplit, which.max, which.min Loading required package: S4Vectors Attaching package: 'S4Vectors' The following objects are masked from 'package:base': I, expand.grid, unname Loading required package: IRanges Loading required package: GenomeInfoDb Loading required package: Biobase Welcome to Bioconductor Vignettes contain introductory material; view with 'browseVignettes()'. To cite Bioconductor, see 'citation("Biobase")', and for packages 'citation("pkgname")'. Attaching package: 'Biobase' The following object is masked from 'package:MatrixGenerics': rowMedians The following objects are masked from 'package:matrixStats': anyMissing, rowMedians Loading required package: SingleCellExperiment Loading required package: SummarizedExperiment Loading required package: MatrixGenerics Loading required package: matrixStats Attaching package: 'MatrixGenerics' The following objects are masked from 'package:matrixStats': colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse, colCounts, colCummaxs, colCummins, colCumprods, colCumsums, colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs, colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats, colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds, colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads, colWeightedMeans, colWeightedMedians, colWeightedSds, colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet, rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods, rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps, rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins, rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks, rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars, rowWeightedMads, rowWeightedMeans, rowWeightedMedians, rowWeightedSds, rowWeightedVars Loading required package: GenomicRanges Loading required package: stats4 Loading required package: BiocGenerics Attaching package: 'BiocGenerics' The following objects are masked from 'package:stats': IQR, mad, sd, var, xtabs The following objects are masked from 'package:base': Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append, as.data.frame, basename, cbind, colnames, dirname, do.call, duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted, lapply, mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank, rbind, rownames, sapply, setdiff, sort, table, tapply, union, unique, unsplit, which.max, which.min Loading required package: S4Vectors Attaching package: 'S4Vectors' The following objects are masked from 'package:base': I, expand.grid, unname Loading required package: IRanges Loading required package: GenomeInfoDb Loading required package: Biobase Welcome to Bioconductor Vignettes contain introductory material; view with 'browseVignettes()'. To cite Bioconductor, see 'citation("Biobase")', and for packages 'citation("pkgname")'. Attaching package: 'Biobase' The following object is masked from 'package:MatrixGenerics': rowMedians The following objects are masked from 'package:matrixStats': anyMissing, rowMedians Error: BiocParallel errors 2 remote errors, element index: 1, 2 0 unevaluated and other errors first remote error: Error in sce[, i]: object of type 'S4' is not subsettable Traceback: 1. bplapply(list(1:2, 1:3), function(i, sce) sce[, i], sce, BPPARAM = SnowParam(2)) 2. bplapply(list(1:2, 1:3), function(i, sce) sce[, i], sce, BPPARAM = SnowParam(2)) 3. .bpinit(manager = manager, X = X, FUN = FUN, ARGS = ARGS, BPPARAM = BPPARAM, . BPOPTIONS = BPOPTIONS, BPREDO = BPREDO) ```

Test 2:

#reproducible example #2
#result: SUCCESS (expected), same as on local setup
library(BiocParallel)
library(SingleCellExperiment)
example(SingleCellExperiment)
res <- bplapply(list(1:2, 1:3), function(i, sce) {
    suppressPackageStartupMessages({ library(SingleCellExperiment) })
    sce[,i]
}, sce, BPPARAM = SnowParam(2))
res
Test 2 Output

``` Loading required package: SummarizedExperiment Loading required package: MatrixGenerics Loading required package: matrixStats Attaching package: 'MatrixGenerics' The following objects are masked from 'package:matrixStats': colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse, colCounts, colCummaxs, colCummins, colCumprods, colCumsums, colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs, colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats, colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds, colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads, colWeightedMeans, colWeightedMedians, colWeightedSds, colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet, rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods, rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps, rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins, rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks, rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars, rowWeightedMads, rowWeightedMeans, rowWeightedMedians, rowWeightedSds, rowWeightedVars Loading required package: GenomicRanges Loading required package: stats4 Loading required package: BiocGenerics Attaching package: 'BiocGenerics' The following objects are masked from 'package:stats': IQR, mad, sd, var, xtabs The following objects are masked from 'package:base': Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append, as.data.frame, basename, cbind, colnames, dirname, do.call, duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted, lapply, mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank, rbind, rownames, sapply, setdiff, sort, table, tapply, union, unique, unsplit, which.max, which.min Loading required package: S4Vectors Attaching package: 'S4Vectors' The following objects are masked from 'package:base': I, expand.grid, unname Loading required package: IRanges Loading required package: GenomeInfoDb Loading required package: Biobase Welcome to Bioconductor Vignettes contain introductory material; view with 'browseVignettes()'. To cite Bioconductor, see 'citation("Biobase")', and for packages 'citation("pkgname")'. Attaching package: 'Biobase' The following object is masked from 'package:MatrixGenerics': rowMedians The following objects are masked from 'package:matrixStats': anyMissing, rowMedians SnglCE> ncells <- 100 SnglCE> u <- matrix(rpois(20000, 5), ncol=ncells) SnglCE> v <- log2(u + 1) SnglCE> pca <- matrix(runif(ncells*5), ncells) SnglCE> tsne <- matrix(rnorm(ncells*2), ncells) SnglCE> sce <- SingleCellExperiment(assays=list(counts=u, logcounts=v), SnglCE+ reducedDims=SimpleList(PCA=pca, tSNE=tsne)) SnglCE> sce class: SingleCellExperiment dim: 200 100 metadata(0): assays(2): counts logcounts rownames: NULL rowData names(0): colnames: NULL colData names(0): reducedDimNames(2): PCA tSNE mainExpName: NULL altExpNames(0): SnglCE> ## coercion from SummarizedExperiment SnglCE> se <- SummarizedExperiment(assays=list(counts=u, logcounts=v)) SnglCE> as(se, "SingleCellExperiment") class: SingleCellExperiment dim: 200 100 metadata(0): assays(2): counts logcounts rownames: NULL rowData names(0): colnames: NULL colData names(0): reducedDimNames(0): mainExpName: NULL altExpNames(0): [[1]] class: SingleCellExperiment dim: 200 2 metadata(0): assays(2): counts logcounts rownames: NULL rowData names(0): colnames: NULL colData names(0): reducedDimNames(2): PCA tSNE mainExpName: NULL altExpNames(0): [[2]] class: SingleCellExperiment dim: 200 3 metadata(0): assays(2): counts logcounts rownames: NULL rowData names(0): colnames: NULL colData names(0): reducedDimNames(2): PCA tSNE mainExpName: NULL altExpNames(0): ```

Test 3:

#test to see if explicit passing of function arguments is necessary with BiocParallel 1.32.5 (vs 1.28.3 on local)
#result: FAIL (unexpected) - same as on local setup
library(BiocParallel)
library(SingleCellExperiment)
example(SingleCellExperiment)
prop = 0.2
num = 2
dummyFUN2 = function(ob,n,p,n_cores=2) {
  BP = BiocParallel::SnowParam(workers = n_cores)
  bplapply(1:n, 
           function(y) { library(SingleCellExperiment); len = ncol(ob); sample(1:len, p*len, replace = FALSE) }
           , BPPARAM = BP)

}
dummyFUN2(sce,num,prop)
Test 3 Output

``` Loading required package: SummarizedExperiment Loading required package: MatrixGenerics Loading required package: matrixStats Attaching package: 'MatrixGenerics' The following objects are masked from 'package:matrixStats': colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse, colCounts, colCummaxs, colCummins, colCumprods, colCumsums, colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs, colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats, colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds, colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads, colWeightedMeans, colWeightedMedians, colWeightedSds, colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet, rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods, rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps, rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins, rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks, rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars, rowWeightedMads, rowWeightedMeans, rowWeightedMedians, rowWeightedSds, rowWeightedVars Loading required package: GenomicRanges Loading required package: stats4 Loading required package: BiocGenerics Attaching package: 'BiocGenerics' The following objects are masked from 'package:stats': IQR, mad, sd, var, xtabs The following objects are masked from 'package:base': Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append, as.data.frame, basename, cbind, colnames, dirname, do.call, duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted, lapply, mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank, rbind, rownames, sapply, setdiff, sort, table, tapply, union, unique, unsplit, which.max, which.min Loading required package: S4Vectors Attaching package: 'S4Vectors' The following objects are masked from 'package:base': I, expand.grid, unname Loading required package: IRanges Loading required package: GenomeInfoDb Loading required package: Biobase Welcome to Bioconductor Vignettes contain introductory material; view with 'browseVignettes()'. To cite Bioconductor, see 'citation("Biobase")', and for packages 'citation("pkgname")'. Attaching package: 'Biobase' The following object is masked from 'package:MatrixGenerics': rowMedians The following objects are masked from 'package:matrixStats': anyMissing, rowMedians SnglCE> ncells <- 100 SnglCE> u <- matrix(rpois(20000, 5), ncol=ncells) SnglCE> v <- log2(u + 1) SnglCE> pca <- matrix(runif(ncells*5), ncells) SnglCE> tsne <- matrix(rnorm(ncells*2), ncells) SnglCE> sce <- SingleCellExperiment(assays=list(counts=u, logcounts=v), SnglCE+ reducedDims=SimpleList(PCA=pca, tSNE=tsne)) SnglCE> sce class: SingleCellExperiment dim: 200 100 metadata(0): assays(2): counts logcounts rownames: NULL rowData names(0): colnames: NULL colData names(0): reducedDimNames(2): PCA tSNE mainExpName: NULL altExpNames(0): SnglCE> ## coercion from SummarizedExperiment SnglCE> se <- SummarizedExperiment(assays=list(counts=u, logcounts=v)) SnglCE> as(se, "SingleCellExperiment") class: SingleCellExperiment dim: 200 100 metadata(0): assays(2): counts logcounts rownames: NULL rowData names(0): colnames: NULL colData names(0): reducedDimNames(0): mainExpName: NULL altExpNames(0): Error: BiocParallel errors 2 remote errors, element index: 1, 2 0 unevaluated and other errors first remote error: Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'ncol': object 'sce' not found Traceback: 1. dummyFUN2(sce, num, prop) 2. bplapply(1:n, function(y) { . library(SingleCellExperiment) . len = ncol(ob) . sample(1:len, p * len, replace = FALSE) . }, BPPARAM = BP) # at line 10-12 of file 3. bplapply(1:n, function(y) { . library(SingleCellExperiment) . len = ncol(ob) . sample(1:len, p * len, replace = FALSE) . }, BPPARAM = BP) 4. .bpinit(manager = manager, X = X, FUN = FUN, ARGS = ARGS, BPPARAM = BPPARAM, . BPOPTIONS = BPOPTIONS, BPREDO = BPREDO) ```

Test 4:

#dummyFUN2 - test to see if (superficially) identical code works on cluster vs local (i.e. without doing library(SingleCellExperiment): 
#result - SUCCESS (unexpected) - different compared to local setup
library(BiocParallel)
library(SingleCellExperiment)
example(SingleCellExperiment)
prop = 0.2
num = 2
dummyFUN2 = function(ob,n,p,n_cores=2) {
  BP = BiocParallel::SnowParam(workers = n_cores)
  bplapply(1:n, 
           function(y, ob, p, r) { len = ncol(ob); sample(1:len, p*len, replace = r) }, 
           ob = ob, p = p, r = FALSE, BPPARAM = BP)

}
res = dummyFUN2(sce,num,prop)
print(res)
Test 4 Output

``` Loading required package: SummarizedExperiment Loading required package: MatrixGenerics Loading required package: matrixStats Attaching package: 'MatrixGenerics' The following objects are masked from 'package:matrixStats': colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse, colCounts, colCummaxs, colCummins, colCumprods, colCumsums, colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs, colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats, colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds, colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads, colWeightedMeans, colWeightedMedians, colWeightedSds, colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet, rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods, rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps, rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins, rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks, rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars, rowWeightedMads, rowWeightedMeans, rowWeightedMedians, rowWeightedSds, rowWeightedVars Loading required package: GenomicRanges Loading required package: stats4 Loading required package: BiocGenerics Attaching package: 'BiocGenerics' The following objects are masked from 'package:stats': IQR, mad, sd, var, xtabs The following objects are masked from 'package:base': Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append, as.data.frame, basename, cbind, colnames, dirname, do.call, duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted, lapply, mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank, rbind, rownames, sapply, setdiff, sort, table, tapply, union, unique, unsplit, which.max, which.min Loading required package: S4Vectors Attaching package: 'S4Vectors' The following objects are masked from 'package:base': I, expand.grid, unname Loading required package: IRanges Loading required package: GenomeInfoDb Loading required package: Biobase Welcome to Bioconductor Vignettes contain introductory material; view with 'browseVignettes()'. To cite Bioconductor, see 'citation("Biobase")', and for packages 'citation("pkgname")'. Attaching package: 'Biobase' The following object is masked from 'package:MatrixGenerics': rowMedians The following objects are masked from 'package:matrixStats': anyMissing, rowMedians SnglCE> ncells <- 100 SnglCE> u <- matrix(rpois(20000, 5), ncol=ncells) SnglCE> v <- log2(u + 1) SnglCE> pca <- matrix(runif(ncells*5), ncells) SnglCE> tsne <- matrix(rnorm(ncells*2), ncells) SnglCE> sce <- SingleCellExperiment(assays=list(counts=u, logcounts=v), SnglCE+ reducedDims=SimpleList(PCA=pca, tSNE=tsne)) SnglCE> sce class: SingleCellExperiment dim: 200 100 metadata(0): assays(2): counts logcounts rownames: NULL rowData names(0): colnames: NULL colData names(0): reducedDimNames(2): PCA tSNE mainExpName: NULL altExpNames(0): SnglCE> ## coercion from SummarizedExperiment SnglCE> se <- SummarizedExperiment(assays=list(counts=u, logcounts=v)) SnglCE> as(se, "SingleCellExperiment") class: SingleCellExperiment dim: 200 100 metadata(0): assays(2): counts logcounts rownames: NULL rowData names(0): colnames: NULL colData names(0): reducedDimNames(0): mainExpName: NULL altExpNames(0): Loading required package: SingleCellExperiment Loading required package: SingleCellExperiment [[1]] [1] 43 69 85 51 33 82 20 42 57 73 32 63 37 47 92 48 8 23 55 39 [[2]] [1] 33 14 35 19 9 1 17 44 18 37 64 66 48 67 80 47 94 25 99 88 ```

Test 4 Output (local setup)

``` Loading required package: SingleCellExperiment Loading required package: SummarizedExperiment Loading required package: MatrixGenerics Loading required package: matrixStats Attaching package: ‘MatrixGenerics’ The following objects are masked from ‘package:matrixStats’: colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse, colCounts, colCummaxs, colCummins, colCumprods, colCumsums, colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs, colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats, colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds, colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads, colWeightedMeans, colWeightedMedians, colWeightedSds, colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet, rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods, rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps, rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins, rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks, rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars, rowWeightedMads, rowWeightedMeans, rowWeightedMedians, rowWeightedSds, rowWeightedVars Loading required package: GenomicRanges Loading required package: stats4 Loading required package: BiocGenerics Attaching package: ‘BiocGenerics’ The following objects are masked from ‘package:stats’: IQR, mad, sd, var, xtabs The following objects are masked from ‘package:base’: anyDuplicated, append, as.data.frame, basename, cbind, colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank, rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply, union, unique, unsplit, which.max, which.min Loading required package: S4Vectors Attaching package: ‘S4Vectors’ The following objects are masked from ‘package:base’: expand.grid, I, unname Loading required package: IRanges Attaching package: ‘IRanges’ The following object is masked from ‘package:grDevices’: windows Loading required package: GenomeInfoDb Loading required package: Biobase Welcome to Bioconductor Vignettes contain introductory material; view with 'browseVignettes()'. To cite Bioconductor, see 'citation("Biobase")', and for packages 'citation("pkgname")'. Attaching package: ‘Biobase’ The following object is masked from ‘package:MatrixGenerics’: rowMedians The following objects are masked from ‘package:matrixStats’: anyMissing, rowMedians Error: BiocParallel errors 2 remote errors, element index: 1, 2 0 unevaluated and other errors first remote error: argument of length 0 In addition: Warning messages: 1: In serialize(data, node$con) : 'package:stats' may not be available when loading 2: In serialize(data, node$con) : Error: BiocParallel errors 2 remote errors, element index: 1, 2 0 unevaluated and other errors first remote error: argument of length 0 ```

Follow-up Questions:

  1. It seems the remote cluster setup is able to run Test 4 without needing to call library(SingleCellExperiment). based on the output, the worker processes do it themselves whereas they do not on the local setup. @Jiefei-Wang Is this due to what you were saying regarding newest version of BiocParallel having better export ability to the workers? Or would you say it is more likely to be due to non-BiocParallel differences between the two tests? (R version, OS, etc).
  2. Since I was using the newest version of BiocParallel for the remote setup, I expected Test 3 to work due to better exporting of arguments whose values are found in global environment to the workers. Unless I misunderstood what you meant by that statement?

Takeaways(?): From what I'm seeing, the best practice for me is to 1) explicitly pass function arguments to the FUN in bplapply and 2) call library() within FUN when working with non-base, special object classes such as SingleCellExperiment or SeuratObject. I will be sharing this code with other users in my lab and know that their setups will be different than mine, so I'm looking for the most robust way of guaranteeing equivalent behavior. Sorry for the long post and thanks again for helping!

Best, David Samvelian

mtmorgan commented 1 year ago

This is the advice in the vignette section 4.1.2

DavoSam commented 1 year ago

@mtmorgan Thank you for this link (and for the reproducible example btw)! I must've missed that detail when I first combed through the vignette, but it makes a lot of sense now

Jiefei-Wang commented 1 year ago

Hi, @DavoSam ,

The problem with your test 3 is actually non-trivial. It is related to the lazy evaluation of R. BiocParallel does export the object ob, but it is an unevaluated one. When it is evaluated in a worker, the worker was looking for the object sce from the global space. Of course, the worker cannot find it as the object sce was not there. Thus you saw the error.

I think this is a place we can improve, I'll make a pull request to fix this issue.

Best, Jiefei