andersen-lab / ivar

iVar is a computational package that contains functions broadly useful for viral amplicon-based sequencing.
https://andersen-lab.github.io/ivar/html/
GNU General Public License v3.0
115 stars 39 forks source link

REF_AA not consistent with actual AA at the position from reference protein fasta file #172

Open Shawn-X-Zhang opened 6 months ago

Shawn-X-Zhang commented 6 months ago

Hello, I used samtools mpileup and ivar variants to identify codon and amino acid changes in assembled genomes with reference genome and .gff3 files. It turned out the codon and amino acid listed in the .tsv file don't match the actual codon and amino acid in reference CDS and protein fasta files. Below is the command I used: mpi_cmd_str = f'samtools mpileup -aa -A -d 20000 -B -Q 0 {sample}.sorted.bam ' ivar_cmd_str = f'ivar variants -p mutations -q 30 -t 0.03 -r {ref_file} -g {gff_file}' cmd_str = mpi_cmd_str + " | " + ivar_cmd_str os.system(cmd_str)

As an example, in the excel screenshot below you can find the sequence validation for SARS-CoV-2 ORF1ab. screenshot Any suggestion? Thank you very much!

cmaceves commented 6 months ago

Hi, would you mind supplying a sample bam, reference, and gff file so I can take a look?

Shawn-X-Zhang commented 6 months ago

Thanks for your quick reply. Github does not allow to upload files over 25MB. So I uploaded the files to google drive. https://drive.google.com/drive/folders/1yytG0_DnAr_mvBTCTZKdlMKa2iHOT_4p?usp=sharing

For Staphylococcus aureus, I compared REF_AA with actual AA at the position for many proteins. Some are consistent, some are not. I also uploaded the files, could you please also take a look? https://drive.google.com/drive/folders/1z8ag7921A5s6Bw9AxNESLCANcMeWtSd0?usp=sharing