Open astewart-twist opened 7 years ago
Hi Andrew, I'm not sure I understand what you mean. Could you elaborate?
On Wednesday, January 18, 2017, Andrew Stewart notifications@github.com wrote:
Is it currently possible in pysamstats to generate consensus variation stats for paired end reads? As far as I can tell if a mismatch is on both reads for a given position, it gets counted twice.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/alimanfoo/pysamstats/issues/60, or mute the thread https://github.com/notifications/unsubscribe-auth/AAq8Qlk8DNEWuPvAZ1xJsOutKaQzhNmaks5rToGrgaJpZM4LnZ1E .
-- Alistair Miles Head of Epidemiological Informatics Centre for Genomics and Global Health http://cggh.org The Wellcome Trust Centre for Human Genetics Roosevelt Drive Oxford OX3 7BN United Kingdom Email: alimanfoo@googlemail.com Web: http://purl.org/net/aliman Twitter: https://twitter.com/alimanfoo Tel: +44 (0)1865 287721
Sure. As far as I can tell, it looks like pysamstats considers the same insertion/deletion/mismatch observed on both forward and reverse reads at the same position as 2 separate observations. For example, in the following alignment...
fwd: ====C====
rev: ====G====
ref: ACGTATGCA
...pysamstats would report:
mismatches
: 2mismatches_fwd
: 1mismatches_rev
: 1Biologically, one might want to interpret this as a single event. I'm just wondering if there exists an option in pysamstats to do that.
Hi Andrew, no there is no option currently to generate a consensus sequence from variation stats. The variation stats count the total number of reads supporting a mismatch, indel etc. at each base. Is it just a consensus sequence you'd like to generate?
On Mon, Jan 23, 2017 at 7:30 PM, Andrew Stewart notifications@github.com wrote:
Sure. As far as I can tell, it looks like pysamstats considers the same insertion/deletion/mismatch observed on both forward and reverse reads at the same position as 2 separate observations. For example, in the following alignment...
fwd: ====C==== rev: ====G====
ref: ACGTATGCA
...pysamstats would report:
- mismatches: 2
- mismatches_fwd: 1
- mismatches_rev: 1
Biologically, one might want to interpret this as a single event. I'm just wondering if there exists an option in pysamstats to do that.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/alimanfoo/pysamstats/issues/60#issuecomment-274591997, or mute the thread https://github.com/notifications/unsubscribe-auth/AAq8QjTx30mbrSvasdNQDSmzTImV1X4Sks5rVP_ngaJpZM4LnZ1E .
-- Alistair Miles Head of Epidemiological Informatics Centre for Genomics and Global Health http://cggh.org The Wellcome Trust Centre for Human Genetics Roosevelt Drive Oxford OX3 7BN United Kingdom Email: alimanfoo@googlemail.com Web: http://purl.org/net/aliman Twitter: https://twitter.com/alimanfoo Tel: +44 (0)1865 287721
Not the consensus sequence, Alistair, just the consensus alignment stats. If you want to point to the relevant lines of code where something like that might best fit, I can try to take a stab at adding it as a new output (variation_consensus?).
On Tue, Jan 24, 2017 at 6:58 AM Alistair Miles notifications@github.com wrote:
Hi Andrew, no there is no option currently to generate a consensus sequence from variation stats. The variation stats count the total number of reads supporting a mismatch, indel etc. at each base. Is it just a consensus sequence you'd like to generate?
On Mon, Jan 23, 2017 at 7:30 PM, Andrew Stewart notifications@github.com wrote:
Sure. As far as I can tell, it looks like pysamstats considers the same insertion/deletion/mismatch observed on both forward and reverse reads at the same position as 2 separate observations. For example, in the following alignment...
fwd: ====C==== rev: ====G====
ref: ACGTATGCA
...pysamstats would report:
- mismatches: 2
- mismatches_fwd: 1
- mismatches_rev: 1
Biologically, one might want to interpret this as a single event. I'm just wondering if there exists an option in pysamstats to do that.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub < https://github.com/alimanfoo/pysamstats/issues/60#issuecomment-274591997>, or mute the thread < https://github.com/notifications/unsubscribe-auth/AAq8QjTx30mbrSvasdNQDSmzTImV1X4Sks5rVP_ngaJpZM4LnZ1E
.
-- Alistair Miles Head of Epidemiological Informatics Centre for Genomics and Global Health http://cggh.org The Wellcome Trust Centre for Human Genetics Roosevelt Drive Oxford OX3 7BN United Kingdom Email: alimanfoo@googlemail.com Web: http://purl.org/net/aliman Twitter: https://twitter.com/alimanfoo Tel: +44 (0)1865 287721 <+44%201865%20287721>
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/alimanfoo/pysamstats/issues/60#issuecomment-274827608, or mute the thread https://github.com/notifications/unsubscribe-auth/ANnwFISuDw8Q24fns4P2jPOPCt7-9YFQks5rVhGvgaJpZM4LnZ1E .
Hi Andrew,
Apologies, I am being dim and still not getting it. Could you walk me through a complete example? E.g., say a site has reference nucleotide "A", in the pileup there are 2 reads supporting the reference nucleotide, 3 reads supporting a mismatching nucleotide "C", 4 reads supporting a 2 bp insertion, and 5 reads supporting a 3 bp deletion (a bit crazy but not impossible). What would be the output from variation_consensus at this site?
The implementation of variation statistics starts from here: https://github.com/alimanfoo/pysamstats/blob/master/pysamstats.pyx#L684. Please feel free to submit a PR, and shout if you need any info on compiling or running the tests.
Cheers, Alistair
On Tuesday, January 31, 2017, Andrew Stewart notifications@github.com wrote:
Not the consensus sequence, Alistair, just the consensus alignment stats. If you want to point to the relevant lines of code where something like that might best fit, I can try to take a stab at adding it as a new output (variation_consensus?).
On Tue, Jan 24, 2017 at 6:58 AM Alistair Miles <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:
Hi Andrew, no there is no option currently to generate a consensus sequence from variation stats. The variation stats count the total number of reads supporting a mismatch, indel etc. at each base. Is it just a consensus sequence you'd like to generate?
On Mon, Jan 23, 2017 at 7:30 PM, Andrew Stewart < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:
Sure. As far as I can tell, it looks like pysamstats considers the same insertion/deletion/mismatch observed on both forward and reverse reads at the same position as 2 separate observations. For example, in the following alignment...
fwd: ====C==== rev: ====G====
ref: ACGTATGCA
...pysamstats would report:
- mismatches: 2
- mismatches_fwd: 1
- mismatches_rev: 1
Biologically, one might want to interpret this as a single event. I'm just wondering if there exists an option in pysamstats to do that.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub < https://github.com/alimanfoo/pysamstats/issues/60#issuecomment-274591997 , or mute the thread < https://github.com/notifications/unsubscribe-auth/ AAq8QjTx30mbrSvasdNQDSmzTImV1X4Sks5rVP_ngaJpZM4LnZ1E
.
-- Alistair Miles Head of Epidemiological Informatics Centre for Genomics and Global Health http://cggh.org The Wellcome Trust Centre for Human Genetics Roosevelt Drive Oxford OX3 7BN United Kingdom Email: alimanfoo@googlemail.com javascript:_e(%7B%7D,'cvml','alimanfoo@googlemail.com'); Web: http://purl.org/net/aliman Twitter: https://twitter.com/alimanfoo Tel: +44 (0)1865 287721 <+44%201865%20287721>
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/alimanfoo/pysamstats/issues/60# issuecomment-274827608, or mute the thread https://github.com/notifications/unsubscribe-auth/ ANnwFISuDw8Q24fns4P2jPOPCt7-9YFQks5rVhGvgaJpZM4LnZ1E .
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/alimanfoo/pysamstats/issues/60#issuecomment-276519282, or mute the thread https://github.com/notifications/unsubscribe-auth/AAq8Qrg8xaedZGGYduw13w4XWfmkbfDEks5rX7rhgaJpZM4LnZ1E .
-- Alistair Miles Head of Epidemiological Informatics Centre for Genomics and Global Health http://cggh.org The Wellcome Trust Centre for Human Genetics Roosevelt Drive Oxford OX3 7BN United Kingdom Email: alimanfoo@googlemail.com Web: http://purl.org/net/aliman Twitter: https://twitter.com/alimanfoo Tel: +44 (0)1865 287721
Is it currently possible in pysamstats to generate consensus variation stats for paired end reads? As far as I can tell if a mismatch is on both reads for a given position, it gets counted twice.