YeoLab / skipper

Skip the peaks and expose RNA-binding in CLIP data
Other
8 stars 3 forks source link

Errors when empty reproducible windows file is used as input to consult_term_reference.R #8

Closed byee4 closed 4 months ago

byee4 commented 1 year ago

Skipper fails to finish with one bad sample (eg. no reproducible windows were found). Could there be a check here to skip this rule?

Rscript --vanilla \
skipper/tools/consult_term_reference.R \
output/reproducible_enriched_windows/RBP_CELL.reproducible_enriched_windows.tsv.gz \
skipper/annotations/c5.go.v7.5.1.symbols.gmt skipper/annotations/encode3_go_terms.reference.tsv.gz \
skipper/annotations/encode3_go_terms.jaccard_index.rds RBP_CELL

── Attaching packages ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.3.6      ✔ purrr   0.3.5 
✔ tibble  3.1.8      ✔ dplyr   1.0.10
✔ tidyr   1.2.1      ✔ stringr 1.4.1 
✔ readr   2.1.3      ✔ forcats 0.5.2 
── Conflicts ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
Loading required package: viridisLite
Rows: 0 Columns: 20
── Column specification ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: "\t"
chr (20): chr, start, end, name, score, strand, gc, gc_bin, chrom, feature_i...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 5067 Columns: 3
── Column specification ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: "\t"
chr (1): term
dbl (2): n_term_windows, f_term_windows

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Error in `summarize()`:
! Problem while computing `..1 = ... %>% as_tibble`.
Caused by error in `names(df) <- repaired_names(c(names2(dimnames(x)), n), repair_hint = TRUE,
  .name_repair = .name_repair)`:
! 'names' attribute [2] must be the same length as the vector [1]
Backtrace:
     ▆
  1. ├─... %>% rename(n_windows_enriched = n)
  2. ├─dplyr::rename(., n_windows_enriched = n)
  3. ├─dplyr::summarize(...)
  4. ├─dplyr:::summarise.data.frame(...)
  5. │ └─dplyr:::summarise_cols(.data, dplyr_quosures(...), caller_env = caller_env())
  6. │   ├─base::withCallingHandlers(...)
  7. │   └─dplyr:::map(quosures, summarise_eval_one, mask = mask)
  8. │     └─base::lapply(.x, .f, ...)
  9. │       └─dplyr (local) FUN(X[[i]], ...)
 10. │         └─mask$eval_all_summarise(quo)
 11. ├─... %>% as_tibble
 12. ├─tibble::as_tibble(.)
 13. ├─tibble:::as_tibble.table(.)
 14. └─base::.handleSimpleError(...)
 15.   └─dplyr (local) h(simpleError(msg, call))
 16.     └─rlang::abort(bullets, call = error_call, parent = skip_internal_condition(e))
Execution halted
-rw-r--r-- 1 bay001 yeo-group 113 Jun  7 22:41 output/reproducible_enriched_windows/RBP_CELL.reproducible_enriched_windows.tsv.gz
[bay001@tscc-1-4 processing]$ zcat output/reproducible_enriched_windows/RBP_CELL.reproducible_enriched_windows.tsv.gz
chr     start   end     name    score   strand  gc      gc_bin  chrom   feature_id      feature_bin     feature_type_top        feature_types   gene_name       gene_id transcript_ids  gene_type_top   transcript_type_top      gene_types      transcript_types
augustboyle commented 1 year ago

Is the problem when running several samples together in the same manifest and blocking the running of other jobs?

It doesn't seem like a problem per se since there is no output in this case, the output files aren't loaded by any other rule, and one might in fact expect an error to be returned when running on zero windows.