YeoLab / skipper

Skip the peaks and expose RNA-binding in CLIP data
Other
8 stars 3 forks source link

R scripts appear incompatible with some R packages #6

Closed byee4 closed 1 year ago

byee4 commented 1 year ago

Just tried updating the Skipper module, which broke the pipeline:

── Attaching packages ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.4.1     ✔ purrr   1.0.1
✔ tibble  3.1.8     ✔ dplyr   1.1.0
✔ tidyr   1.3.0     ✔ stringr 1.5.0
✔ readr   2.1.4     ✔ forcats 1.0.0
── Conflicts ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
Rows: 42 Columns: 30
── Column specification ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: "\t"
chr (13): chr, strand, gc_bin, chrom, feature_type_top, feature_types, gene_...
dbl (17): start, end, name, score, gc, feature_id, feature_bin, input_sum, c...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 19377309 Columns: 17
── Column specification ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: "\t"
chr (11): chrom, strand, feature_type_top, feature_types, gene_name, gene_id...
dbl  (6): start, end, name, score, feature_id, feature_bin

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Joining with `by = join_by(sampling_group)`
Error in `summarize()`:
ℹ In argument: `tibble(...)`.
ℹ In group 1: `sampling_group = "EXON_MRNA"`.
Caused by error in `sample.int()`:
! invalid 'size' argument
Backtrace:
     ▆
  1. ├─... %>% inner_join(all_windows, .)
  2. ├─dplyr::inner_join(all_windows, .)
  3. ├─dplyr:::inner_join.data.frame(all_windows, .)
  4. │ └─dplyr::auto_copy(x, y, copy = copy)
  5. │   ├─dplyr::same_src(x, y)
  6. │   └─dplyr:::same_src.data.frame(x, y)
  7. │     └─base::is.data.frame(y)
  8. ├─dplyr::select(., name)
  9. ├─dplyr::ungroup(.)
 10. ├─dplyr::summarize(...)
 11. ├─dplyr:::summarise.grouped_df(...)
 12. │ └─dplyr:::summarise_cols(.data, dplyr_quosures(...), by, "summarise")
 13. │   ├─base::withCallingHandlers(...)
 14. │   └─dplyr:::map(quosures, summarise_eval_one, mask = mask)
 15. │     └─base::lapply(.x, .f, ...)
 16. │       └─dplyr (local) FUN(X[[i]], ...)
 17. │         └─mask$eval_all_summarise(quo)
 18. │           └─dplyr (local) eval()
 19. ├─tibble::tibble(...)
 20. │ └─tibble:::tibble_quos(xs, .rows, .name_repair)
 21. │   └─rlang::eval_tidy(xs[[j]], mask)
 22. ├─base::sample(...)
 23. │ └─base::sample.int(length(x), size, replace, prob)
 24. └─base::.handleSimpleError(...)
 25.   └─dplyr (local) h(simpleError(msg, call))
 26.     └─dplyr (local) handler(cnd)
 27.       └─rlang::abort(message, class = error_class, parent = parent, call = error_call)

The R scripts do appear to work using the following R packages:

── Attaching packages ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.3.6      ✔ purrr   0.3.5 
✔ tibble  3.1.8      ✔ dplyr   1.0.10
✔ tidyr   1.2.1      ✔ stringr 1.4.1 
✔ readr   2.1.3      ✔ forcats 0.5.2 
── Conflicts ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
Rows: 42 Columns: 30
── Column specification ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: "\t"
chr (13): chr, strand, gc_bin, chrom, feature_type_top, feature_types, gene_...
dbl (17): start, end, name, score, gc, feature_id, feature_bin, input_sum, c...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 19377309 Columns: 17
── Column specification ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: "\t"
chr (11): chrom, strand, feature_type_top, feature_types, gene_name, gene_id...
dbl  (6): start, end, name, score, feature_id, feature_bin

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Joining, by = "sampling_group"
`summarise()` has grouped output by 'sampling_group'. You can override using the `.groups` argument.
Joining, by = "name"
byee4 commented 1 year ago

For what it's worth, this is the sessionInfo() that appears to be compatible with Skipper:

R version 4.1.3 (2022-03-10)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /lib/libopenblasp-r0.3.21.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] forcats_0.5.2   stringr_1.4.1   dplyr_1.0.10    purrr_0.3.5    
[5] readr_2.1.3     tidyr_1.2.1     tibble_3.1.8    ggplot2_3.3.6  
[9] tidyverse_1.3.2

loaded via a namespace (and not attached):
 [1] pillar_1.8.1        compiler_4.1.3      cellranger_1.1.0   
 [4] dbplyr_2.2.1        tools_4.1.3         timechange_0.2.0   
 [7] lubridate_1.9.2     googledrive_2.0.0   jsonlite_1.8.4     
[10] lifecycle_1.0.3     gargle_1.2.1        gtable_0.3.1       
[13] pkgconfig_2.0.3     rlang_1.0.6         reprex_2.0.2       
[16] DBI_1.1.3           cli_3.4.1           haven_2.5.2        
[19] xml2_1.3.3          withr_2.5.0         httr_1.4.5         
[22] generics_0.1.3      vctrs_0.4.2         fs_1.6.1           
[25] hms_1.1.2           googlesheets4_1.0.1 grid_4.1.3         
[28] tidyselect_1.2.0    glue_1.6.2          R6_2.5.1           
[31] fansi_1.0.4         readxl_1.4.2        tzdb_0.3.0         
[34] modelr_0.1.9        magrittr_2.0.3      backports_1.4.1    
[37] scales_1.2.1        ellipsis_0.3.2      rvest_1.0.3        
[40] assertthat_0.2.1    colorspace_2.0-3    utf8_1.2.3         
[43] stringi_1.7.8       munsell_0.5.0       broom_1.0.3        
[46] crayon_1.5.2   
augustboyle commented 1 year ago

The change is due to the following update to the sample function in R version 4.2.0:

sample() and sample.int() have additional sanity checks on their size and n arguments.

The sampling script has been updated so that instead of a vector of an identical number, the sample function receives a single integer.