cfe-lab / MiCall

Pipeline for processing FASTQ data from an Illumina MiSeq to genotype human RNA viruses like HIV and hepatitis C
https://cfe-lab.github.io/MiCall
GNU Affero General Public License v3.0
14 stars 9 forks source link

Replace mapping with de novo assembly #442

Closed donkirkby closed 4 years ago

donkirkby commented 6 years ago

We currently use bowtie2 to map reads to a large set of reference sequences. For most samples, it works well. However, we have had some problems with reference drift (#290), calling HCV subtypes (#436), insertion and deletion positions (#398), and samples that produce different results when you rerun the mapping (#405).

We'd like to experiment with using de novo assembly instead of mapping.

donkirkby commented 6 years ago

@jeff-k has been using de novo assembly to look at several samples that got strange results with the current MiCall pipeline. It looks like one advantage of the technique will be that we can distinguish between these two scenarios:

With the current MiCall pipeline, both of those scenarios just look like lousy mapping with gaps in coverage.

We propose this plan for a full MiCall pipeline that includes de novo assembly:

As you can see, this just affects the prelim_map and remap steps. Because they're only running on a small number of contigs, they should be much faster.

One risk is that the de novo assembly step might be much slower in some cases than the remap step is currently.