bjmt / universalmotif

Motif manipulation functions for R.
GNU General Public License v3.0
25 stars 8 forks source link

R session aborted / fatal error when running read_homer from a list of motifs #23

Closed aspepin closed 2 years ago

aspepin commented 2 years ago

Hi, I ran homer on a list of genomic regions, and my output generated 202 motifs. Next, I ran I try to read the output using the read_homer function, but my R session aborts after 1-3 minutes. I do not get the same problem when I run the same code but when there are only 8 motifs in the directory (rather than 202). I'm not sure if the problem has something to do with the error I get (see below) about 'treeio'.

setwd("/Volumes/BINF1_Raid/home/aspepin/projects/H3K4me3_McMasterU/exp2/")
library(universalmotif)

# Bioconductor version 3.11 (BiocManager 1.30.12),
# ?BiocManager::install for help
# Bioconductor version '3.11' is out-of-date; the
# current release version '3.14' is available with R
# version '4.1'; see https://bioconductor.org/install
# Registered S3 method overwritten by 'treeio':
#   method     from
# root.phylo ape 

#the directory below contains the output of homer i.e. 202 motifs
up.dir <- "./output/homer_vs_whole_genome/CDvsHFD_up/knownResults/" 
up.motif.list <- list.files(up.dir,
                         pattern="*.motif",
                         full.names = TRUE,
                         recursive = TRUE)
up.motifs <- lapply(up.motif.list, function(x) read_homer(x))

Whether I run the above, or if I replace the last line (with lapply function) with the following:

up.motifs <- list()
up.motifs <- lapply(1:length(up.motif.list), function(i){
  motifs <- read_homer(up.motif.list[i])
  return(motifs)
})

My R session aborts and it says "R session aborted, encountered fatal error".

Any idea how I can solve this?

> sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Mojave 10.14.5

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets 
[6] methods   base     

other attached packages:
[1] universalmotif_1.6.4

loaded via a namespace (and not attached):
 [1] treeio_1.12.0       tinytex_0.31       
 [3] tidyselect_1.1.1    xfun_0.29          
 [5] purrr_0.3.4         lattice_0.20-41    
 [7] ggfun_0.0.5         colorspace_2.0-2   
 [9] vctrs_0.3.8         generics_0.1.1     
[11] stats4_4.0.2        yaml_2.2.1         
[13] utf8_1.2.2          gridGraphics_0.5-1 
[15] rlang_0.4.12        pillar_1.6.4       
[17] glue_1.6.0          DBI_1.1.2          
[19] BiocGenerics_0.34.0 rvcheck_0.1.8      
[21] lifecycle_1.0.1     ggseqlogo_0.1      
[23] zlibbioc_1.34.0     Biostrings_2.56.0  
[25] munsell_0.5.0       gtable_0.3.0       
[27] IRanges_2.22.2      ps_1.6.0           
[29] parallel_4.0.2      fansi_0.5.0        
[31] Rcpp_1.0.7          scales_1.1.1       
[33] BiocManager_1.30.12 S4Vectors_0.26.1   
[35] jsonlite_1.7.2      XVector_0.28.0     
[37] ggplot2_3.3.5       aplot_0.1.2        
[39] processx_3.5.2      dplyr_1.0.7        
[41] rbibutils_2.2.7     grid_4.0.2         
[43] ggtree_2.2.4        Rdpack_2.1.4       
[45] tools_4.0.2         yulab.utils_0.0.4  
[47] magrittr_2.0.1      lazyeval_0.2.2     
[49] patchwork_1.1.1     tibble_3.1.6       
[51] crayon_1.4.2        ape_5.6-2          
[53] tidyr_1.1.4         pkgconfig_2.0.3    
[55] MASS_7.3-53.1       ellipsis_0.3.2     
[57] tidytree_0.3.9      ggplotify_0.0.6    
[59] assertthat_0.2.1    rstudioapi_0.13    
[61] R6_2.5.1            nlme_3.1-152       
[63] compiler_4.0.2 
bjmt commented 2 years ago

Hi,

Thanks for reporting this. My guess would be an error happening somewhere during the process of generating the universalmotif-class motif within read_homer() (that portion is done using C++ code, which is the most likely to cause a sudden crash out of R). Unfortunately it's not something I can more accurately diagnose without a bit more information.

First, if you can, please update to Bioconductor v3.14 (this would also require updating to R v4.1). This is because I can only issue bug fixes to the current Bioconductor version, and there is also a small chance whatever this bug is has been fixed since then.

Second, if it's not too much trouble, please try and narrow down which motif(s) are causing the problem -- i.e. in your loop just also include a line to print which file is being read or some such. If you're able to narrow this down to a single motif which is causing trouble, please just paste it here and I'm certain I could easily fix the problem.

Hopefully this isn't too onerous, I'd be glad to figure out what's causing this and fix it.

(Edit: Unless I'm misunderstanding and actually the crash is happening some time after having read all the motifs?)

aspepin commented 2 years ago

Hi! I was able to narrow down which files from 2 different lists are causing the problem. Here are the 2 motif files: File 1:

>VNNGGATTADNN   bcd(Homeobox)/Embryo-Bcd-ChIP-Seq(GSE86966)/Homer   6.298101    -6.564805   0   T:195.0(17.52%),B:6222.5(14.26%),P:1e-2
0.343   0.229   0.252   0.176
0.302   0.189   0.292   0.217
0.217   0.298   0.279   0.206
0.205   0.080   0.629   0.086
0.001   0.001   0.997   0.001
0.696   0.302   0.001   0.001
0.001   0.001   0.001   0.997
0.001   0.068   0.001   0.930
0.816   0.001   0.001   0.182
0.313   0.120   0.326   0.240
0.227   0.281   0.298   0.195
0.239   0.246   0.246   0.269

File 2:

>NNRCCTAACT BOS1(MYB)/col-BOS1-DAP-Seq(GSE60143)/Homer  5.846626    -6.199610   0   T:228.0(15.21%),B:5785.3(12.65%),P:1e-2
0.195   0.331   0.209   0.265
0.234   0.204   0.240   0.322
0.447   0.047   0.482   0.023
0.101   0.641   0.148   0.110
0.027   0.901   0.032   0.040
0.246   0.100   0.210   0.445
0.888   0.025   0.041   0.046
0.714   0.232   0.027   0.027
0.118   0.574   0.081   0.227
0.257   0.239   0.054   0.450

Thank you so much for your time and help!!

Anne-Sophie

bjmt commented 2 years ago

Thanks.

Unfortunately no matter what I do I can't get it to crash trying to read the motifs, so I'm afraid it could be a setup-specific bug. I've tried both on macOS and Linux. Just to confirm, did you try this again with the newest version of universalmotif? (I see you are on macOS yourself. While I was able to test the same version as you are using on Linux, this is not something I can do easily on macOS, so I could only test using more recent versions of the package.)

Also, just in case: does trying to read in the motifs using read_matrix() instead also crash? E.g.:

read_matrix("motif.homer", header = ">", positions = "rows")
aspepin commented 2 years ago

Hi, it looks like the version I'm using of universalmotif is 1.6.4, I'm assuming that's the latest version available for the Bioconductor and R versions I'm using as there does not seem to be newer versions available when I try to update it. I can't update Bioconductor/R for now (I will definitely do as soon as my papers are published).

I tried using read_matrix, and I do not get any error, R does not crash, and it successfully generates a formal class 'universalmotif' object, even for the motifs that were causing R to crash. I will use this function instead.... Not sure what caused the problem. Thank you so much again for your help, I really appreciate!!

Anne-Sophie

bjmt commented 2 years ago

I understand -- no worries about not having the option to update. Sorry I couldn't figure out what was wrong in the end. Glad to hear read_matrix() didn't crash for you (which puzzles me slightly, but alas).

Should you ever update to the newest version and still encounter this kind of bug, feel free to reopen this issue and I'd be happy to try and investigate again.