andersen-lab / ivar

iVar is a computational package that contains functions broadly useful for viral amplicon-based sequencing.
https://andersen-lab.github.io/ivar/html/
GNU General Public License v3.0
118 stars 40 forks source link

ivar trim only processes the 1st ref region of bam? #69

Open tomSimonet opened 4 years ago

tomSimonet commented 4 years ago

Hello,

I've been trying ivar-1.2.3 trim, but in my hands only the first reference region of my bam file is processed (eg chr1, from a human genome) ; so in the output bam, only reads from this chomosome remain (even with the -e option). Is it the normal behaviour? Have I missed some instruction? I had to make a loop over the chomosomes, changing the bam header at each step, and finally merging the trimmed bams...

./progr/ivar-1.2.3/src/ivar trim -i sample_input.bam -b dna_panel_primers_hg19.bed -m 1 -q 0 -p sample_output.bam > sample_trimP.log 2>&1

also a minor question/request:

as you can see from the command line, I don't want to trim by quality, so I wrote -q 0 ; is it correct? could it be done in a more explicit way (eg "-1 to disable this feature", or with no default value implying nothing is done by default)

many thanks

gkarthik commented 4 years ago

Hello, thank you for raising this issue. You are correct in that ivar only trims primers from the first region. We will try to fix this within the next 2 weeks. Currently, specifying -q 0 is the only way to disable quality trimming. We could try to add a flag to disable this.

adrhelix commented 3 years ago

I recently ran into this issue as well with ivar-1.3.1: I found that only the first reference sequence can be trimmed with ivar trim.

The logic in ivar.cpp and trim_primer_quality.cpp seems a bit strange: the region_ parameter is important to trim_bam_qual_primer(), but (as far as I can tell) there is no way to set it from the command line so it always defaults to the first reference sequence in the header.

I can work around this issue given our current needs, but trimming only the first reference sequence is a surprising and unfortunate limitation for an otherwise useful tool. Ideally, ivar trim would be able to trim reads from more than one reference sequence, regardless of order in the header. Short of that, it would be great if this limitation were clearly documented.

In any case, thank you very much for writing the tool!

sam-baird commented 3 months ago

I'm encountering a slightly different but probably related issue in iVar v1.4.2 when performing primer clipping on a multi-chromosome reference genome (influenza gene segments). While ivar trim is primer trimming all references in the BAM file (not just the first reference in my case), it is ignoring the first column in the BED file and clipping the same primer coordinates in all references. The effect of this is that iVar is "over-clipping" for primers that don't exist in a particular reference. When I delete primers for all references except one in the BED file, it clips the same coordinates in all references in the BAM file.

cmaceves commented 3 months ago

hi @sam-baird would you mind sharing an example bam and bed file so I can take a look?

sam-baird commented 3 months ago

I attached a ZIP with BED files and downsampled BAM files. flu.bam is the original BAM before primer trimming. flu.trimmed.bam is from primer trimming using the full BED file:

ivar trim -e -i flu.bam -b AVRL_H5N1_250bpAmpWGS_v2.bed -p flu.trimmed

flu.trimmed_expected.bam is from primer trimming with a subset of the BED file with only the PP599468-M reference.

Here is an IGV screenshot showing the difference between the three BAM files at the M gene segment coordinates PP599468-M:213-252. The visible letters in the reads indicate soft-clipped bases. The top shows pre-primer trimming where no reads are soft-clipped. The middle shows where extra clipping has occurred using the full BED file. The bottom shows what the clipping should look like (using the BED file with only the PP599468-M reference). In the middle alignment, clipping has expanded beyond the expected end coordinate of 224 of HPAI-MP-250_2_LEFT for most reads, instead ending at position 241, the end coordinate of the NP gene segment primer HPAI-NP-250_2_LEFT which has overlapping coordinates with the MP gene segment primer.

igv_snapshot

Thanks for taking a look! flu_example.zip