Closed DarioS closed 3 years ago
Can you make the offending data available?
I do one better and write a minimal, reproducible example. Copy and paste away!
library(stringi)
library(Biostrings)
reads <- DNAStringSet(stri_rand_strings(40000000, 150, pattern = "[GATC]"))
trinucleotideFrequency(reads) # *** caught segfault ** address 0x7f498a1c4040, cause 'memory not mapped'
Crash even faster with:
library(Biostrings)
reads <- rep(DNAStringSet("GGACGTCC"), 5e7)
res <- trinucleotideFrequency(reads)
Should be fixed in Biostrings 2.61.2 (BioC 3.14) and Biostrings 2.60.2 (BioC 3.13). See commit 050d5722d82f950c7c5dcb138726604e1446810d
H.
My aim is to compute entropy of all of the unmapped reads for each patient sample of whole genome sequencing data and remove low-complexity sequences before doing a microbiome analysis. Filtering works for the smaller samples, but segmentation faults on the larger ones. I use R 4.1.0 and Biostrings 2.60.1. Ihave narrowed the crashing down to
trinucleotideFrequency
.