denglab / SeqSero2

SeqSero2
Other
33 stars 18 forks source link

Reporting inter-serotype contamination in a sample (bug?) #45

Open AleSR13 opened 2 years ago

AleSR13 commented 2 years ago

Hello! First, thanks for making this very nice package. We use it a lot in our insititute. I wanted to report what I think is a small bug. We had a sample that was determined to have the O-9, 46 allele and the serotype Ouakam. However, we got a note on it saying:

Co-existence of multiple serotypes detected, indicating potential inter-serotype contamination. See 'Extracted_antigen_alleles.fasta' for detected serotype determinant alleles.

If I look at the extracted alleles, I see that three alleles were found:

NODE_7_length_440_cov_32.036364 O-9,46_wzy_partial; blast score: 399.997 identity%: 100.0%; alignment from 1 to 215 of antigen NODE_2_length_1260_cov_49.892946 O-9,46_wbaV; blast score: 1779.44 identity%: 98.7%; alignment from 1 to 1001 of antigen NODE_1_length_1749_cov_35.202479 fliC z29; blast score: 2693.54 identity%: 98.68%; alignment from 1 to 1517 of antigen

I was reading about it and it has been reported that the only difference between O-antigen 09 and 09,46 is this wbaV allele. Looking at your code, it indeed seems that if a sample has the ) O-9,46_wzy allele + the O-9,46_wbaV, you do not consider it as 'contamination' but just as a normal O-9,46 allele. However, I think that you do not do the same if the wzy is found only as a partial allele and therefore, in my example, it is marked as a contaminated sample. I have not tested whether this is the issue but I assumed that by reading your code. I am also assuming that this is not deliberate but it could be that you have a good reason why you don't give the same treatment to the partial allele. If you need more information or you prefer me to change the code myself and do a pull request, I am happy to do it.

Some info: Using SeqSero2 v1.1.1 through a conda environment. This yaml file was used to make the environment:

name: seqsero2
channels:
  - bioconda
  - defaults
dependencies:
  - bedtools=2.17.0
  - biopython=1.73
  - blas=1.0
  - blast=2.9.0
  - bwa=0.7.17
  - ncbi-ngs-sdk=2.10.0
  - pip=20.0.2
  - python=3.6.9
  - salmid=0.1.23
  - samtools=1.9
  - spades=3.13.1
  - sra-tools=2.10.0
  - seqsero2=1.1.1
  - pandas==1.1.0
michellescribner commented 1 year ago

Hello @AleSR13! I also just encountered a sample that was determined to have the O-9, 46 allele and the serotype Ouakam and was also flagged for contamination. The three alleles detected are completely identical to what you shared above.

Is this result a bug? I'm curious if you ever identified evidence of contamination in your sample?

AleSR13 commented 11 months ago

Hi @michellescribner ! Sorry for the late reply. I never got to hear from them and sadly I changed jobs (and even field) so I don't know what was the solution. Sorry that I cannot be of more help.