Bioconductor / Biostrings

Efficient manipulation of biological strings
https://bioconductor.org/packages/Biostrings
54 stars 16 forks source link

Value of SET_STRING_ELT() must be a 'CHARSXP' not a 'raw' #95

Open gevro opened 1 year ago

gevro commented 1 year ago

Hi,

I'm encountering a very strange bug/error.

  1. The error occurs only for a specific input dataset that has the same properties as hundreds of other input datasets that run without any issues.
  2. The error occurs only when the code is run as an R script (e.g.. script.R file with heading #!/usr/bin/env Rscript). It does NOT occur when running the exact same code in an interactive R session, which is the strangest aspect of this issue and makes it harder to debug.

Code causing the error:

# bam is loaded from a BAM file with scanBam (Rsamtools package)

# Logical vector specifying reads from which to subset a sequence
includereads <- sample(c(TRUE,FALSE),length(bam$seq),replace=TRUE)

bam$layered.seq <- sequenceLayer(bam$seq,bam$cigar)

bam$subseq <- rep(NA,length(bam$seq))
bam$subseq[includereads] <- subseq(bam$layered.seq[includereads],start=bam$layered.start[includereads],end=bam$layered.end[includereads])

# Any code that subsets a range of items from bam$subseq causes the error. For example:
bam$subseq[includereads]

Error:

Value of SET_STRING_ELT() must be a 'CHARSXP' not a 'raw'
Execution halted

I am upgraded to the latest versions of Biostrings, GenomicAlignments, Rsamtools, etc, current as of yesterday.

Since I cannot share the input data here, if any of you has a sense of what this might be, I can connect off-line if you share your e-mail address to share with you example data.

gevro commented 1 year ago

I traced the bug to something even weirder. A different object under the bam object is modified in a prior line of code per a recommendation of issue #65 : as(as(as.raw(x),"XRaw"),"BString")

Somehow this code changes bam$subseq, which isn't even listed or part of this as(as(as.raw(x),"XRaw"),"BString") code.

It is a bit hard to explain without sharing the raw data and exact code, which I can do offline due to the large data size, but somehow modifying one element of a list with the above string modifications is causing accessing of another element of that list to throw the Value of SET_STRING_ELT() must be a 'CHARSXP' not a 'raw'. And this only happens in Rscript mode, not in interactive mode.

There must be some really deep bug somewhere in Biostrings or one of its dependencies, or maybe in R base code itself?

I will also keep trying to narrow down the exact source to try and generate a minimal reproducible example, if you'd like to wait for that.

hpages commented 1 year ago

Hi @gevro ,

The error occurs only when the code is run as an R script (e.g.. script.R file with heading #!/usr/bin/env Rscript).

Maybe this is not using the same R as the one you use interactively?

Please show us the output of:

Rscript -e 'suppressMessages(library(Rsamtools));sessionInfo()'

Also show us the output of sessionInfo() obtained from within an interactive R session (after loading Rsamtools).

Thanks, H.

gevro commented 1 year ago

Interactive and Rscript both seem the same:

Interactive:

R version 4.2.0 (2022-04-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0

locale:
[1] C

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] Rsamtools_2.12.0     Biostrings_2.64.1    XVector_0.36.0      
[4] GenomicRanges_1.48.0 GenomeInfoDb_1.32.2  IRanges_2.30.1      
[7] S4Vectors_0.34.0     BiocGenerics_0.42.0 

loaded via a namespace (and not attached):
 [1] codetools_0.2-18       crayon_1.5.1           bitops_1.0-7          
 [4] zlibbioc_1.42.0        BiocParallel_1.30.3    tools_4.2.0           
 [7] RCurl_1.98-1.7         parallel_4.2.0         compiler_4.2.0        
[10] GenomeInfoDbData_1.2.8

Rscript:

R version 4.2.0 (2022-04-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0

locale:
[1] C

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] Rsamtools_2.12.0     Biostrings_2.64.1    XVector_0.36.0      
[4] GenomicRanges_1.48.0 GenomeInfoDb_1.32.2  IRanges_2.30.1      
[7] S4Vectors_0.34.0     BiocGenerics_0.42.0 

loaded via a namespace (and not attached):
 [1] codetools_0.2-18       crayon_1.5.1           bitops_1.0-7          
 [4] zlibbioc_1.42.0        BiocParallel_1.30.3    tools_4.2.0           
 [7] RCurl_1.98-1.7         parallel_4.2.0         compiler_4.2.0        
[10] GenomeInfoDbData_1.2.8

Also, overnight, I found even stranger things suggesting some kind of memory leak across variables. So this bug is really weird. As mentioned above, I traced the issue to a specific line that I wrote related to your prior suggest for issue #65. Somehow that line is corrupting a different variable that I'm not even accessing.

#Convert ip/pw codec to seconds
codecv1toseconds <- function(x,secperframe){
        x[x>63] <- (x[x>63]-64)*2+64
        x[x>190] <- (x[x>190]-192)*2+192
        x[x>444] <- (x[x>444]-448)*2+448
        return(x*secperframe)
}

head(bam$subseq[includereads])

blah <- lapply(sequenceLayer(BStringSet(lapply(bam$tag$ip[includereads],function(x){as(as(as.raw(x),"XRaw"),"BString")})),bam$cigar[includesubreads]),function(x){codecv1toseconds(as.integer(x),secondsperframe)})

head(bam$subseq[includereads])

 OUTPUT:
First head(bam$subseq[includereads]) command:
"AAGGATTTCTTCCTG-GTGAACA" "AAGGATTTCTTCC--TGTGAACA"
[3] "AAGGATTTCTTCCTGTGAGAACA" "AAGGATTTCTTCCTGGGTGAACA"
[5] "AAGATTTTCTACCTGTGTGAACA" "AAGGATATCTTCCTGTGTGAACA"

Second head(bam$subseq[includereads]) command:
Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'x' in selecting a method for function 'head': Value of SET_STRING_ELT() must be a 'CHARSXP' not a 'list'
Calls: head -> .handleSimpleError -> h
Execution halted

Notice that the command in between the two head(bam$subseq[includereads]) commands doesn’t even modify the bam variable I am querying with the head(bam$subseq[includereads]) commands.

Also, when I remove the outermost lapply from the problematic line, I get a different error:

blah <- sequenceLayer(BStringSet(lapply(bam$tag$ip[includereads],function(x){as(as(as.raw(x),"XRaw"),"BString")})),bam$cigar[includereads],D.letter="\376")

head(blah)

BStringSet object of length 6:
    width seq
Error in XVector:::extract_character_from_XRaw_by_ranges(x, start, width,  : 
  embedded nul in string: '\006\005`\0\034\037\021\005\376\f\003\024$.\034\027\r\004\n\006\004\023\a\026\017'\005\b\aD\005\376\034\t'

This also happens without the D.letter option:

blah <- sequenceLayer(BStringSet(lapply(bam$tag$ip[includereads],function(x){as(as(as.raw(x),"XRaw"),"BString")})),bam$cigar[includereads])

head(blah)

BStringSet object of length 6:
    width seq
Error in XVector:::extract_character_from_XRaw_by_ranges(x, start, width,  : 
  embedded nul in string: '\006\005`\0\034\037\021\005-\f\003\024$.\034\027\r\004\n\006\004\023\a\026\017'\005\b\aD\005-\034\t'

Note that this line of code works fine for hundreds of other input files, and there is something specific about this input file that is causing this error. But this code should be robust. I don't understand what might be happening. And how assignment to one variable somehow causes a completely different variable to throw an error, not to mention the difference between interactive vs Rscript.

Note also what this line is trying to do in the first place is to convert PacBio BAM file ip and pw tags to BStringSet and then layer them with the read's cigar.

The ip and pw tag specs are defined here: https://pacbiofileformats.readthedocs.io/en/latest/BAM.html

Screen Shot 2023-03-27 at 1 12 11 PM

Example tag:

ip:B:C,20,32,42,50,57,33,34,31,39,30,29,30,40,34,20,32,28,31,35,14,19,30,24,38,20,31,55,27,20,26,26,26,20,19,23,32,37,42,29,27,27,25,19,22,23,29,30,31,33,36,29,21,26,20,44,30,25,22,21,28,33,61,26,29,17,33,32,14,28,24,27,21,23,18,30,18,34,23,35,36,23,30,21,29,32,44,25,35,44,37,22,24,23,21,32,30,25,26,18,23,24,18,20,21,29,76,28,16,35,47,45,54,24,20,30,23,52,51,18,30,21,25,26,23,20,21,24,18,27,25,16,22,30,27,36,49,26,26,30,14,20,22,21,19,26,27,37,30,26,20,24,34,24,21,22,33,34,27,23,21,18,16,32,23,28,26,20,34,37,18,16,23,26,35,36,20,27,43,28,34,22,34,22,18,31,30,31,39,38,32,22,19,14,23,33,36,42,40,18,16,16,17,21,21,20,38,28,17,25,44,33,30,23,20,25,62,20,33,32,35,17,23,26,34,13,30,29,48,32,27,27,20,14,17,11,30,28,22,21,41,62,59,31,44,19,19,15,24,25,25,34,24,21,26,28,26,34,21,22,24,21,28,26,26,24,28,22,19,26,24,23,49,32,19,28,30,25,22,29,25,16,40,27,43,51,26,36,28,18,24,19,20,14,21,26,18,19,29,22,21,18,26,51,19,33,25,25,23,25,32,30,38,54,25,26,34,23,21,28,36,27,35,51,28,28,39,46,24,22,25,17,24,17,27,99,19,34,40,62,145,199,131,123,95,108,101,103,99,88,95,101,108,111,99,103,98,125,122,131,128,117,122,122,103,94,90,94,92,85,93,131,93,97,87,93,92,90,81,83,84,88,99,91,110,104,103,107,82,71,69,74,98,80,78,80,82,70,109,59,49,33,32,16,16,20,23,64,29,71,47,17,29,25,17,66,55,32,20,25,18,61,39
gevro commented 1 year ago

Update - I found a workaround, though still no idea what the bug is caused by-

Before this step:

bam$subseq[includereads] <- subseq(bam$layered.seq[includereads],start=bam$layered.start[includereads],end=bam$layered.end[includereads])

Do this:

saveRDS(bam$subseq,"bam.subseq.RDS")
bam$subseq <- readRDS("bam.subseq.RDS")

i.e. write to disk and read back the bam$subseq objects.

Why this works is mysterious.

gevro commented 3 months ago

Hi, This bug is happening again. I have created a reproducible code and input data. Can someone connect with me so I can transfer to them the code and data to reproduce this? This is a very bizarre bug. Likely some kind of data/buffer overflow bug.

Just schematically, I'm seeing weird things like this:

data <- list() data$A <- sourcedataA (made with Biostrings functions) data$B <- sourcedataB (made with Biostrings functions) saveRDS(data$A,"dataA.RDS") data$A <- readRDS("dataA.RDS") => The readRDS line causes the bug: Value of SET_STRING_ELT() must be a 'CHARSXP' not a 'list'

However, with this code, the bug does NOT occur. Somehow, saving data$A is affected by a prior step of loading data$B, which doesn't make sense. data <- list() data$A <- sourcedataA saveRDS(data$A,"dataA.RDS") data$A <- readRDS("dataA.RDS")

Also, with this code, the bug does NOT occur. Somehow, saving data$A and reading it back in before the step of loading data$B prevents the bug from occurring. data <- list() data$A <- sourcedataA saveRDS(data$A,"dataA.RDS") data$A <- readRDS("dataA.RDS") data$B <- sourcedataB saveRDS(data$A,"dataA.RDS") data$A <- readRDS("dataA.RDS")

vjcitn commented 3 months ago

please post output of sessionInfo() in the session in which bug is triggered, after the error erupts.

vjcitn commented 3 months ago

please ensure that BiocManager::valid() returns TRUE

vjcitn commented 3 months ago

I think it would also be illuminating for you to set options(error=recover) before running the code that triggers the bug. you will get a stack trace that will help pin down where the bug is. post the whole trace.

gevro commented 3 months ago

THanks.

  1. sessionInfo, after loading all the libraries used by the script:
    
    # > sessionInfo()
    # R version 4.3.1 (2023-06-16)
    # Platform: x86_64-pc-linux-gnu (64-bit)
    # Running under: Ubuntu 22.04.1 LTS

Matrix products: default

BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0

LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0

locale:

[1] C

time zone: America/New_York

tzcode source: system (glibc)

attached base packages:

[1] stats4 stats graphics grDevices utils datasets methods

[8] base

other attached packages:

[1] lubridate_1.9.2 forcats_1.0.0

[3] stringr_1.5.0 dplyr_1.1.2

[5] purrr_1.0.1 readr_2.1.4

[7] tidyr_1.3.0 tibble_3.2.1

[9] ggplot2_3.4.2 tidyverse_2.0.0

[11] configr_0.3.5 GenomicAlignments_1.36.0

[13] SummarizedExperiment_1.30.2 Biobase_2.60.0

[15] MatrixGenerics_1.12.2 matrixStats_1.0.0

[17] Rsamtools_2.16.0 Biostrings_2.68.1

[19] XVector_0.40.0 GenomicRanges_1.52.0

[21] GenomeInfoDb_1.36.1 IRanges_2.34.1

[23] S4Vectors_0.38.1 BiocGenerics_0.46.0

loaded via a namespace (and not attached):

[1] gtable_0.3.3 RcppTOML_0.2.2 lattice_0.21-8

[4] tzdb_0.4.0 vctrs_0.6.3 tools_4.3.1

[7] bitops_1.0-7 generics_0.1.3 parallel_4.3.1

[10] fansi_1.0.4 pkgconfig_2.0.3 Matrix_1.5-4.1

[13] lifecycle_1.0.3 GenomeInfoDbData_1.2.10 ini_0.3.1

[16] compiler_4.3.1 munsell_0.5.0 codetools_0.2-19

[19] RCurl_1.98-1.12 yaml_2.3.7 pillar_1.9.0

[22] crayon_1.5.2 BiocParallel_1.34.2 DelayedArray_0.26.6

[25] tidyselect_1.2.0 stringi_1.7.12 grid_4.3.1

[28] colorspace_2.1-0 cli_3.6.1 magrittr_2.0.3

[31] S4Arrays_1.0.4 utf8_1.2.3 withr_2.5.0

[34] scales_1.2.1 timechange_0.2.0 hms_1.1.3

[37] rlang_1.1.1 Rcpp_1.0.11 glue_1.6.2

[40] jsonlite_1.8.7 R6_2.5.1 zlibbioc_1.46.0


2. BiocManager::valid doesn't seem to return a simple logical result, it shows this:

BiocManager::valid() == TRUE Error: 'list' object cannot be coerced to type 'logical' In addition: Warning message: 469 packages out-of-date; 0 packages too new

  1. options(error=recover) doesn't really help, because the bug only happens during Rscript runs, and does NOT happen during interactive sessions. That's why this bug is so bizarre:
    
    Error:
    Value of SET_STRING_ELT() must be a 'CHARSXP' not a 'character'
    recover called non-interactively; frames dumped, use debugger() to view

Available environments had calls: 1: readRDS("file.RDS") 2: (function () { if (.isMethodsDispatchOn()) { tState <- tracingSta 3: try(dump.frames()) 4: tryCatch(expr, error = function(e) { call <- conditionCall(e) if (!is 5: tryCatchList(expr, classes, parentenv, handlers) 6: tryCatchOne(expr, names, parentenv, handlers[[1]]) 7: doTryCatch(return(expr), name, parentenv, handler)

Enter an environment number, or 0 to exit



I have the ~2Gb input data and code that should be reproducible if you or anyone would like to examine this.
hpages commented 3 months ago

You're using Biostrings 2.68.1 which belongs to Bioconductor 3.17. Please update your installation to the latest version of Bioconductor (~3.20~ 3.19, needs R 4.4) and reinstall Biostrings. This will get you Biostrings 2.72.1, which is the most current version of the package. See if you can still reproduce the problem with that version. Thanks!

gevro commented 3 months ago

Interesting - looks like this error is no longer happening after the upgrade! I wonder what the issue was... This was the strangest bug I've ever seen in R.

gevro commented 3 months ago

Update: The error happened again with the latest Biostrings and R versions. It happened this time with a different chunk of the data. Fortunately it is happening with an even smaller set of code lines so it helps me narrow down more what might be happening. I will let you know once I have a minimal reproducible code.

gevro commented 3 months ago

@hpages : Ok I have a minimal reproducible example, ~200 lines of code and an 845 Mb input file. Would you like me to share this with you? It is a very bizarre bug. Changing most lines in this 200 lines of code, even unrelated to the object whose manipulation triggers the error, causes the error.

hpages commented 3 months ago

@gevro Thanks for your hard work tracking and narrowing down this nasty bug. Much appreciated. Can you please attach the file containing the code to your next comment? For the data, it would be great if you could put it on a file sharing service like Dropbox or similar. Thanks again!

gevro commented 3 months ago

Thanks. Since I need to protect the input data (I couldn't make a reproducible version with dummy data), I can send you by e-mail.

hpages commented 3 months ago

Unfortunately if the data is 845Mb, email is not going to work (generally speaking email attachments cannot/should not exceed 10Mb or 20Mb).

gevro commented 3 months ago

Oh I meant I will send a dropbox link with the file.

hpages commented 3 months ago

Ah, of course, that makes sense. Please send at hpages dot on dot github at gmail.com (same address as in Biostrings's DESCRIPTION file). Thanks!

gevro commented 1 month ago

Hi, I sent the dropbox link with script/files to reproduce the bug. Thanks!

gevro commented 1 month ago

Hi, I also e-mailed @hpages a docker that reproduces the bug. It seems like a memory overflow bug.

gevro commented 1 month ago

Hi, Just checking if someone from the developer team is able to reproduce and check this bug? Thank you!

vjcitn commented 1 month ago

hpages is on vacation send dropbox link to carey dot vj at gmail and i will have a look

gevro commented 1 month ago

Thanks - sent!

gevro commented 4 weeks ago

Hi all, I managed to reproduce the bug inside the docker, while running with valgrind.

Looks like the bug might be at XStringSet_class.c:141, or at least that is the line triggering the memory leak, but this is the first time I'm running valgrind, so to be honest, I don't really know how to interpret this.

I previously sent to @vjcitn itn and @hpages the docker to reproduce this. I now also updated the docker to also have valgrind installed.

Output below:

==1109==    at 0x4A6E3FC: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x4A6F0F0: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x4A71B02: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x4A74DB1: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49B199F: Rf_eval (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49B6154: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49B199F: Rf_eval (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49E8AF0: Rf_ReplIteration (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49E8E6F: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49E8F2F: run_Rmainloop (in /usr/lib/R/lib/libR.so)
==1109==    by 0x10907E: main (in /usr/lib/R/bin/exec/R)
==1109==  Address 0xa5d97be0 is 0 bytes inside a block of size 10,096 free'd
==1109==    at 0x48451EF: free (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==1109==    by 0x49F1D99: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49F527E: Rf_allocVector3 (in /usr/lib/R/lib/libR.so)
==1109==    by 0x491C196: Rf_coerceVector (in /usr/lib/R/lib/libR.so)
==1109==    by 0x4A6E60F: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x4A6F0F0: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x4A71B02: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x4A74DB1: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49B199F: Rf_eval (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49B6154: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49B199F: Rf_eval (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49E8AF0: Rf_ReplIteration (in /usr/lib/R/lib/libR.so)
==1109==  Block was alloc'd at
==1109==    at 0x4842808: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==1109==    by 0x49F5718: Rf_allocVector3 (in /usr/lib/R/lib/libR.so)
==1109==    by 0x7DC9EB48: new_CHARACTER_from_XStringSet (XStringSet_class.c:141)
==1109==    by 0x4959A69: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x4959FF7: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49A45C9: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49B11F9: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49B156A: Rf_eval (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49B378E: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49B45D6: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49B16A5: Rf_eval (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49B67F3: ??? (in /usr/lib/R/lib/libR.so)
==1109== 
==1109== Invalid read of size 1
==1109==    at 0x4A6EB25: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x4A71B02: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x4A74DB1: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49B199F: Rf_eval (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49B6154: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49B199F: Rf_eval (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49E8AF0: Rf_ReplIteration (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49E8E6F: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49E8F2F: run_Rmainloop (in /usr/lib/R/lib/libR.so)
==1109==    by 0x10907E: main (in /usr/lib/R/bin/exec/R)
==1109==  Address 0xa5d97be0 is 0 bytes inside a block of size 10,096 free'd
==1109==    at 0x48451EF: free (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==1109==    by 0x49F1D99: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49F527E: Rf_allocVector3 (in /usr/lib/R/lib/libR.so)
==1109==    by 0x491C196: Rf_coerceVector (in /usr/lib/R/lib/libR.so)
==1109==    by 0x4A6E60F: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x4A6F0F0: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x4A71B02: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x4A74DB1: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49B199F: Rf_eval (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49B6154: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49B199F: Rf_eval (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49E8AF0: Rf_ReplIteration (in /usr/lib/R/lib/libR.so)
==1109==  Block was alloc'd at
==1109==    at 0x4842808: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==1109==    by 0x49F5718: Rf_allocVector3 (in /usr/lib/R/lib/libR.so)
==1109==    by 0x7DC9EB48: new_CHARACTER_from_XStringSet (XStringSet_class.c:141)
==1109==    by 0x4959A69: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x4959FF7: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49A45C9: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49B11F9: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49B156A: Rf_eval (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49B378E: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49B45D6: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49B16A5: Rf_eval (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49B67F3: ??? (in /usr/lib/R/lib/libR.so)
==1109== 
==1109== Invalid read of size 1
==1109==    at 0x4A6F3C0: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x4A71B02: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x4A74DB1: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49B199F: Rf_eval (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49B6154: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49B199F: Rf_eval (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49E8AF0: Rf_ReplIteration (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49E8E6F: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49E8F2F: run_Rmainloop (in /usr/lib/R/lib/libR.so)
==1109==    by 0x10907E: main (in /usr/lib/R/bin/exec/R)
==1109==  Address 0xa5d97be0 is 0 bytes inside a block of size 10,096 free'd
==1109==    at 0x48451EF: free (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==1109==    by 0x49F1D99: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49F527E: Rf_allocVector3 (in /usr/lib/R/lib/libR.so)
==1109==    by 0x491C196: Rf_coerceVector (in /usr/lib/R/lib/libR.so)
==1109==    by 0x4A6E60F: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x4A6F0F0: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x4A71B02: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x4A74DB1: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49B199F: Rf_eval (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49B6154: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49B199F: Rf_eval (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49E8AF0: Rf_ReplIteration (in /usr/lib/R/lib/libR.so)
==1109==  Block was alloc'd at
==1109==    at 0x4842808: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==1109==    by 0x49F5718: Rf_allocVector3 (in /usr/lib/R/lib/libR.so)
==1109==    by 0x7DC9EB48: new_CHARACTER_from_XStringSet (XStringSet_class.c:141)
==1109==    by 0x4959A69: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x4959FF7: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49A45C9: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49B11F9: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49B156A: Rf_eval (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49B378E: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49B45D6: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49B16A5: Rf_eval (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49B67F3: ??? (in /usr/lib/R/lib/libR.so)
==1109== 
==1109== Invalid read of size 8
==1109==    at 0x4A6F3C9: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x4A71B02: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x4A74DB1: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49B199F: Rf_eval (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49B6154: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49B199F: Rf_eval (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49E8AF0: Rf_ReplIteration (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49E8E6F: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49E8F2F: run_Rmainloop (in /usr/lib/R/lib/libR.so)
==1109==    by 0x10907E: main (in /usr/lib/R/bin/exec/R)
==1109==  Address 0xa5d97c00 is 32 bytes inside a block of size 10,096 free'd
==1109==    at 0x48451EF: free (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==1109==    by 0x49F1D99: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49F527E: Rf_allocVector3 (in /usr/lib/R/lib/libR.so)
==1109==    by 0x491C196: Rf_coerceVector (in /usr/lib/R/lib/libR.so)
==1109==    by 0x4A6E60F: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x4A6F0F0: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x4A71B02: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x4A74DB1: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49B199F: Rf_eval (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49B6154: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49B199F: Rf_eval (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49E8AF0: Rf_ReplIteration (in /usr/lib/R/lib/libR.so)
==1109==  Block was alloc'd at
==1109==    at 0x4842808: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==1109==    by 0x49F5718: Rf_allocVector3 (in /usr/lib/R/lib/libR.so)
==1109==    by 0x7DC9EB48: new_CHARACTER_from_XStringSet (XStringSet_class.c:141)
==1109==    by 0x4959A69: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x4959FF7: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49A45C9: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49B11F9: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49B156A: Rf_eval (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49B378E: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49B45D6: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49B16A5: Rf_eval (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49B67F3: ??? (in /usr/lib/R/lib/libR.so)
==1109== 
==1109== Invalid read of size 1
==1109==    at 0x4A70FA5: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x4A71B02: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x4A74DB1: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49B199F: Rf_eval (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49B6154: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49B199F: Rf_eval (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49E8AF0: Rf_ReplIteration (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49E8E6F: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49E8F2F: run_Rmainloop (in /usr/lib/R/lib/libR.so)
==1109==    by 0x10907E: main (in /usr/lib/R/bin/exec/R)
==1109==  Address 0xa5d97be0 is 0 bytes inside a block of size 10,096 free'd
==1109==    at 0x48451EF: free (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==1109==    by 0x49F1D99: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49F527E: Rf_allocVector3 (in /usr/lib/R/lib/libR.so)
==1109==    by 0x491C196: Rf_coerceVector (in /usr/lib/R/lib/libR.so)
==1109==    by 0x4A6E60F: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x4A6F0F0: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x4A71B02: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x4A74DB1: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49B199F: Rf_eval (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49B6154: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49B199F: Rf_eval (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49E8AF0: Rf_ReplIteration (in /usr/lib/R/lib/libR.so)
==1109==  Block was alloc'd at
==1109==    at 0x4842808: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==1109==    by 0x49F5718: Rf_allocVector3 (in /usr/lib/R/lib/libR.so)
==1109==    by 0x7DC9EB48: new_CHARACTER_from_XStringSet (XStringSet_class.c:141)
==1109==    by 0x4959A69: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x4959FF7: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49A45C9: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49B11F9: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49B156A: Rf_eval (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49B378E: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49B45D6: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49B16A5: Rf_eval (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49B67F3: ??? (in /usr/lib/R/lib/libR.so)
==1109== 
==1109== Invalid read of size 8
==1109==    at 0x4A70F60: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x4A71B02: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x4A74DB1: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49B199F: Rf_eval (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49B6154: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49B199F: Rf_eval (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49E8AF0: Rf_ReplIteration (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49E8E6F: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49E8F2F: run_Rmainloop (in /usr/lib/R/lib/libR.so)
==1109==    by 0x10907E: main (in /usr/lib/R/bin/exec/R)
==1109==  Address 0xa5d97c10 is 48 bytes inside a block of size 10,096 free'd
==1109==    at 0x48451EF: free (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==1109==    by 0x49F1D99: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49F527E: Rf_allocVector3 (in /usr/lib/R/lib/libR.so)
==1109==    by 0x491C196: Rf_coerceVector (in /usr/lib/R/lib/libR.so)
==1109==    by 0x4A6E60F: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x4A6F0F0: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x4A71B02: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x4A74DB1: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49B199F: Rf_eval (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49B6154: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49B199F: Rf_eval (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49E8AF0: Rf_ReplIteration (in /usr/lib/R/lib/libR.so)
==1109==  Block was alloc'd at
==1109==    at 0x4842808: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==1109==    by 0x49F5718: Rf_allocVector3 (in /usr/lib/R/lib/libR.so)
==1109==    by 0x7DC9EB48: new_CHARACTER_from_XStringSet (XStringSet_class.c:141)
==1109==    by 0x4959A69: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x4959FF7: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49A45C9: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49B11F9: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49B156A: Rf_eval (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49B378E: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49B45D6: ??? (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49B16A5: Rf_eval (in /usr/lib/R/lib/libR.so)
==1109==    by 0x49B67F3: ??? (in /usr/lib/R/lib/libR.so)
==1109== 
DONE

# Writing output file...Error in data.frame(readtype = rep("c", length(which(includezmws))), zm = ccsbam$tag$zm[includezmws],  : 
  Value of SET_STRING_ELT() must be a 'CHARSXP' not a 'double'
Execution halted
==1109== 
==1109== HEAP SUMMARY:
==1109==     in use at exit: 828,280,600 bytes in 258,431 blocks
==1109==   total heap usage: 802,599 allocs, 544,168 frees, 3,688,722,083 bytes allocated
==1109== 
==1109== LEAK SUMMARY:
==1109==    definitely lost: 0 bytes in 0 blocks
==1109==    indirectly lost: 0 bytes in 0 blocks
==1109==      possibly lost: 0 bytes in 0 blocks
==1109==    still reachable: 828,280,600 bytes in 258,431 blocks
==1109==                       of which reachable via heuristic:
==1109==                         newarray           : 4,264 bytes in 1 blocks
==1109==         suppressed: 0 bytes in 0 blocks
==1109== Rerun with --leak-check=full to see details of leaked memory
==1109== 
==1109== For lists of detected and suppressed errors, rerun with: -s
==1109== ERROR SUMMARY: 2516 errors from 6 contexts (suppressed: 0 from 0)