Open PAMorin opened 5 years ago
Dear PAMorin
ANGSD cannot generate a FASTA file based on the reference genome that is provided. So unfortunately it cannot include indels in the FASTA file.
-Anders
I believe you can use bcftools consensus to also handle indels.
On Sun, Jun 30, 2019, 07:35 Anders Albrechtsen notifications@github.com wrote:
Dear PAMorin
ANGSD cannot generate a FASTA file based on the reference genome that is provided. So unfortunately it cannot include indels in the FASTA file.
-Anders
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ANGSD/angsd/issues/233?email_source=notifications&email_token=AABDQEMPAHYI5VSY3QV6UA3P5CLATA5CNFSM4H4AYTRKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODY4KLKI#issuecomment-507028905, or mute the thread https://github.com/notifications/unsubscribe-auth/AABDQEK6JNUEPJJK36DTKULP5CLATANCNFSM4H4AYTRA .
Thanks. I am starting to use the bcf approach, as described at: https://samtools.github.io/bcftools/howtos/consensus-sequence.html. It appears to be doing a good job at calling indels around variable repeats.
Phil
On 6/30/19 1:54 PM, Erik Garrison wrote:
I believe you can use bcftools consensus to also handle indels.
On Sun, Jun 30, 2019, 07:35 Anders Albrechtsen notifications@github.com wrote:
Dear PAMorin
ANGSD cannot generate a FASTA file based on the reference genome that is provided. So unfortunately it cannot include indels in the FASTA file.
-Anders
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub
https://github.com/notifications/unsubscribe-auth/AABDQEK6JNUEPJJK36DTKULP5CLATANCNFSM4H4AYTRA .
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ANGSD/angsd/issues/233?email_source=notifications&email_token=AFAAERLBSM6Z5JI5TOM6SY3P5EMOZA5CNFSM4H4AYTRKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODY4TS5I#issuecomment-507066741, or mute the thread https://github.com/notifications/unsubscribe-auth/AFAAERMDRHMVLAN2NAMBM5TP5EMOZANCNFSM4H4AYTRA.
--
Phillip A. Morin, Ph.D. Southwest Fisheries Science Center 8901 La Jolla Shores Dr. La Jolla, CA 92037, USA Phone: 858-546-7165 Fax: 858-546-7003 phillip.morin@noaa.gov http://swfsc.noaa.gov/mmtd-mmgenetics
As for calling indels, I'm not sure how good bcftools is. But it does have the tool to generate the consensus.
On Tue, Jul 2, 2019, 20:00 Phillip Morin notifications@github.com wrote:
Thanks. I am starting to use the bcf approach, as described at: https://samtools.github.io/bcftools/howtos/consensus-sequence.html. It appears to be doing a good job at calling indels around variable repeats.
Phil
On 6/30/19 1:54 PM, Erik Garrison wrote:
I believe you can use bcftools consensus to also handle indels.
On Sun, Jun 30, 2019, 07:35 Anders Albrechtsen <notifications@github.com
wrote:
Dear PAMorin
ANGSD cannot generate a FASTA file based on the reference genome that is provided. So unfortunately it cannot include indels in the FASTA file.
-Anders
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub
< https://github.com/notifications/unsubscribe-auth/AABDQEK6JNUEPJJK36DTKULP5CLATANCNFSM4H4AYTRA
.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/ANGSD/angsd/issues/233?email_source=notifications&email_token=AFAAERLBSM6Z5JI5TOM6SY3P5EMOZA5CNFSM4H4AYTRKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODY4TS5I#issuecomment-507066741>,
or mute the thread < https://github.com/notifications/unsubscribe-auth/AFAAERMDRHMVLAN2NAMBM5TP5EMOZANCNFSM4H4AYTRA .
--
Phillip A. Morin, Ph.D. Southwest Fisheries Science Center 8901 La Jolla Shores Dr. La Jolla, CA 92037, USA Phone: 858-546-7165 Fax: 858-546-7003 phillip.morin@noaa.gov http://swfsc.noaa.gov/mmtd-mmgenetics
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ANGSD/angsd/issues/233?email_source=notifications&email_token=AABDQEKAD4DSZWCW75PTHJ3P5PT27A5CNFSM4H4AYTRKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZC33FA#issuecomment-507886996, or mute the thread https://github.com/notifications/unsubscribe-auth/AABDQEO57OML2NOSSEVSOUTP5PT27ANCNFSM4H4AYTRA .
I'm using doFasta -2 to extract the consensus sequence from bwa alignment (bam) files (for mitochondrial genomes). I've found that if there is a variable length repeat, ANGSD always generates the longer sequence even if most of the reads have the shorter repeat (e.g., if most are CCC, but a few are CCCC, the consensus is always CCCC). I think this is because ANGSD only counts the number of A, C, G, or T occurrences at the location, and ignores the indel "-" in the bam file. Is there a way to refine the bam alignment or otherwise use ANGSD doFasta to recognize indels?