MDU-PHL / ngmaster

In silico multi-antigen sequence typing for Neisseria gonorrhoeae (NG-MAST)
GNU General Public License v3.0
5 stars 5 forks source link

multiple alleles missed if one is novel #22

Closed simonrharris closed 7 years ago

simonrharris commented 7 years ago

Hi. We've been using ng-master on some large gono datasets and noticed a small issue with the csv output. If there is a case where one allele is found twice, but one is not in the existing database (i.e. is "new"), the two alleles are output in the fasta, but the csv file reports only the known allele, and the ST is not called as multiple.

The issue is here in the code: alleleSEQS.append(tbpbRECR)

Search trimmed sequence against database dictionary

                        try:
                            tbpbRESULT = (tbpbDICT[str(tbpbSEQR)])
                            tbpb = tbpbRESULT.split('PB')[1]
                        except KeyError:
                            tbpb = 'new'
                            continue
                        if tbpb not in tbpbCOUNT:
                            tbpbCOUNT.add(tbpb)

Thanks Simon

kwongj commented 7 years ago

Hi Simon,

Thanks for the bug!

Think I've fixed this now. Let me know if it is still an issue.

Note that if there are duplicate copies of the porB or tbpB alleles in the genome (i.e. multiple identical alleles), they will just be reported as a single allele - though I don't think I've ever seen this in a real NG isolate. I imagine that doing traditional NG-MAST by PCR would also only report it as a single NG-MAST type.

simonrharris commented 7 years ago

Thanks Jason!

On 17/01/2017 04:40, Jason Kwong wrote:

Hi Simon,

Thanks for the bug!

Think I've fixed this now. Let me know if it is still an issue.

Note that if there are duplicate copies of the porB or tbpB alleles (i.e. multiple identical alleles), they will just be reported as a single allele - though I don't think I've ever seen this in a real NG isolate. I imagine that doing traditional NG-MAST by PCR would also only report it as a single NG-MAST type.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/MDU-PHL/ngmaster/issues/22#issuecomment-273021009, or mute the thread https://github.com/notifications/unsubscribe-auth/AKCyCZHNIfdrfgKnlkf1W4Vv_Dh2d65qks5rTEYwgaJpZM4Lkov5.

Dr Simon R. Harris PhD Senior Staff Scientist Infection Genomics Wellcome Trust Sanger Institute Genome Campus Hinxton Cambridge CB10 1SA

Tel: +44 (0)1223 494942 Email: simon.harris@sanger.ac.uk Group website: http://www.sanger.ac.uk/science/groups/parkhill-group Individual website: http://www.sanger.ac.uk/people/directory/harris-simon

Associate Scientist Centre for Genomic Pathogen Surveillance Website: http://www.pathogensurveillance.net/

-- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.

kwongj commented 7 years ago

Closed #22.