DormanLab / AmpliCI

AmpliCI, a model-based algorithm for denoising Illumina amplicon data.
BSD 3-Clause "New" or "Revised" License
21 stars 7 forks source link

Variable sequence length? #8

Open shump2 opened 1 year ago

shump2 commented 1 year ago

Hi, Based on the error estimation - it requires te same sequence length. Can the model be adjusted to account for variable lengths are many molecular markers will amplify variable lengths, e.g., indels, insertions etc.? P

xiyupeng commented 1 year ago

Hello, Thanks for your interest and the suggestion!

Currently, we are still working on a new version that can be applied to datasets with variable lengths. I just tested the current version, AmpliCI (v2.1), and it works great on our simple simulation datasets with variable sequence lengths. However, systematical evaluation on real datasets is still needed to finalize the software. Real datasets are usually more complex. If you run into any problems, please let us know. It would be great if you can suggest any public amplicon dataset for molecular markers with variable lengths, that we can work and test on.

The error estimation does not require the same sequence length. Once the true haplotypes are selected, the number of errors would be counted and fitted into loess regression to generate the error profile.

Thanks, Xiyu