Thanks for your tool. We've been using it successfully for a while now. However, now we have started to sequence samples with higher depth (~400k merged read pairs), and Rbec seems to fail in two distinct ways with these samples. As a test, I downsampled this file in 100k increments, resulting in files with 100k, 200k, 300k, and 400k amplicons. Rbec runs fine until the 300k sample, where I get the error message:
Error in toupper(seqs) : invalid input 'T<AF>U' in 'utf8towcs'
Calls: Rbec -> consis_err -> toupper
One workaround for others encountering a similar error is to downsample your amplicons when you start seeing these kinds of errors. From my test you can downsample to at least 200k merged amplicons - maybe a bit higher if you look into it. The ceiling is somewhere between 200k and 300k.
Hi,
Thanks for your tool. We've been using it successfully for a while now. However, now we have started to sequence samples with higher depth (~400k merged read pairs), and Rbec seems to fail in two distinct ways with these samples. As a test, I downsampled this file in 100k increments, resulting in files with 100k, 200k, 300k, and 400k amplicons. Rbec runs fine until the 300k sample, where I get the error message:
Running the 400k sample, I get the error message:
The errors have been rather cryptic, but I think this seems to happen in the "calculation of error generating probabilities" step.
One workaround for others encountering a similar error is to downsample your amplicons when you start seeing these kinds of errors. From my test you can downsample to at least 200k merged amplicons - maybe a bit higher if you look into it. The ceiling is somewhere between 200k and 300k.
Happy to share problematic files if it helps.
Thanks! -shane