segfault with high depth samples

Hi,

Thanks for your tool. We've been using it successfully for a while now. However, now we have started to sequence samples with higher depth (~400k merged read pairs), and Rbec seems to fail in two distinct ways with these samples. As a test, I downsampled this file in 100k increments, resulting in files with 100k, 200k, 300k, and 400k amplicons. Rbec runs fine until the 300k sample, where I get the error message:

Error in toupper(seqs) : invalid input 'T<AF>U' in 'utf8towcs'
Calls: Rbec -> consis_err -> toupper

Running the 400k sample, I get the error message:

 *** caught segfault ***
address 0x7f708b7d513f, cause 'invalid permissions'

Traceback:
 1: vroom_(file, delim = "\001", col_names = "V1", col_types = cols(col_character()),     id = NULL, skip = skip, col_select = col_select, name_repair = "minimal",     na = na, quote = "", trim_ws = FALSE, escape_double = FALSE,     escape_backslash = FALSE, comment = "", skip_empty_rows = skip_empty_rows,     locale = locale, guess_max = 0, n_max = n_max, altrep = vroom_altrep(altrep),     num_threads = num_threads, progress = progress)

The errors have been rather cryptic, but I think this seems to happen in the "calculation of error generating probabilities" step.

One workaround for others encountering a similar error is to downsample your amplicons when you start seeing these kinds of errors. From my test you can downsample to at least 200k merged amplicons - maybe a bit higher if you look into it. The ceiling is somewhere between 200k and 300k.

Happy to share problematic files if it helps.

Thanks! -shane

PengfanZhang / Rbec

segfault with high depth samples #6