Open Shawn-X-Zhang opened 6 months ago
Hi, would you mind supplying a sample bam, reference, and gff file so I can take a look?
Thanks for your quick reply. Github does not allow to upload files over 25MB. So I uploaded the files to google drive. https://drive.google.com/drive/folders/1yytG0_DnAr_mvBTCTZKdlMKa2iHOT_4p?usp=sharing
For Staphylococcus aureus, I compared REF_AA with actual AA at the position for many proteins. Some are consistent, some are not. I also uploaded the files, could you please also take a look? https://drive.google.com/drive/folders/1z8ag7921A5s6Bw9AxNESLCANcMeWtSd0?usp=sharing
Hello, I used samtools mpileup and ivar variants to identify codon and amino acid changes in assembled genomes with reference genome and .gff3 files. It turned out the codon and amino acid listed in the .tsv file don't match the actual codon and amino acid in reference CDS and protein fasta files. Below is the command I used: mpi_cmd_str = f'samtools mpileup -aa -A -d 20000 -B -Q 0 {sample}.sorted.bam ' ivar_cmd_str = f'ivar variants -p mutations -q 30 -t 0.03 -r {ref_file} -g {gff_file}' cmd_str = mpi_cmd_str + " | " + ivar_cmd_str os.system(cmd_str)
As an example, in the excel screenshot below you can find the sequence validation for SARS-CoV-2 ORF1ab.
Any suggestion?
Thank you very much!