andersen-lab / ivar

iVar is a computational package that contains functions broadly useful for viral amplicon-based sequencing.
https://andersen-lab.github.io/ivar/html/
GNU General Public License v3.0
115 stars 39 forks source link

Incorrect amino acid numbering #180

Closed proychou closed 3 months ago

proychou commented 4 months ago

Description Numbering of amino acids in output of ivar variants doesn't account for strand and therefore returns incorrect numbering for genes on - strand.

Command to reproduce behavior For example this occurs with gp045/OPG057 of mpox samtools mpileup --ignore-overlaps --count-orphans --no-BAQ --max-depth 0 --min-BQ 20 --reference ON563414.fa --region ON563414.3:40212-39094 test.sorted.bam | ivar variants -t 0.01 -q 20 -m 10 -g ON563414.gff -r ON563414.fa -p test

Expected behavior We have a sample with mutation G40916A, which should produce S6L but instead returns R368Q

We think it could be related to this section not considering strand from the ref gff. https://github.com/andersen-lab/ivar/blob/08aac334d9499d789803e6b2da2321a32282d255/src/ref_seq.cpp#L14

asereewit commented 4 months ago

noticed a typo. original command should be: samtools mpileup --ignore-overlaps --count-orphans --no-BAQ --max-depth 0 --min-BQ 20 --reference ON563414.fa --region ON563414.3:39094-40212 test.sorted.bam | ivar variants -t 0.01 -q 20 -m 10 -g ON563414.gff -r ON563414.fa -p test

incorrect amino acid numbering still occurs

asereewit commented 4 months ago

Potential solution provided in PR #181

gkarthik commented 4 months ago

@asereewit as described also in https://github.com/andersen-lab/ivar/pull/181,

Thank you for submitting this pull request! While taking the reverse complement works, I made additional changes to take strand into account while getting the right codon position. Closing this PR in favor of this branch. Please let me know if that branch is working as expected for you.

@proychou Thank you for raising this issue. I pushed a fix for this on this branch. If you could test that branch or alternatively share a test file with me, I can test it on my end too.

I tested it with mpox genome ON563414.3 where G40926A is correctly translated on the negative strand as Q92E with ref GAA to alternate codon CAA. image

asereewit commented 3 months ago

Hi @gkarthik, @proychou and I work together. I think there might be a typo in your comment. G40926A would lead to a codon change of CAA -> TAA, and a amino acid change of Q92*?

But I tested your pr181 branch and it works as expected. Here is the ivar variants output comparison between master branch and pr181 branch. master:

ivar_variants_master

pr181:

ivar_variants_pr181

Thank you for working on this!

gkarthik commented 3 months ago

Oh I didn't know you were working together 😅 . Glad to hear its working as expected.

You are correct, I made a mistake there. It should be,

G40926C is correctly translated on the negative strand as Q92E with ref codon CAA to alternate codon GAA. image

asereewit commented 3 months ago

Yes we do haha. We use ivar in a couple of our pipelines so thank you!

gkarthik commented 3 months ago

Glad to hear that, happy to help! 🙂