denglab / SeqSero2

SeqSero2
Other
33 stars 18 forks source link

tiny bug: output of k-mer mode only lists first of R1 and R2 files as input files #14

Closed kapsakcj closed 3 years ago

kapsakcj commented 5 years ago

When I run SeqSero2 v1.0.0 using -m k -t 2 -i R1.fastq.gz R2.fastq.gz options, the output only lists the first of my pair of reads for an isolate. Allele mode -m a does not show this behavior, and lists both input files in the output.

$ SeqSero2_package.py -m k -t 2 -i SRR1258439_1.fastq.gz SRR1258439_2.fastq.gz

Output_directory:SeqSero_result_06_25_2019_14_42_088815533
Input files:    SRR1258439_1.fastq.gz
O antigen prediction:   7
H1 antigen prediction(fliC):    y
H2 antigen prediction(fljB):    1,5
Predicted subspecies:   I
Predicted antigenic profile:    7:y:1,5
Predicted serotype:     Bareilly

and - reads in reverse order

$ SeqSero2_package.py -m k -t 2 -i SRR1258439_2.fastq.gz SRR1258439_1.fastq.gz

Output_directory:SeqSero_result_06_25_2019_14_45_301893729
Input files:    SRR1258439_2.fastq.gz
O antigen prediction:   7
H1 antigen prediction(fliC):    y
H2 antigen prediction(fljB):    1,5
Predicted subspecies:   I
Predicted antigenic profile:    7:y:1,5
Predicted serotype:     Bareilly
denglab commented 5 years ago

This behavior is desinged intentionally. There is a sub-sampling process in the workflow of k-mer raw reads mode. Instead of using all the input raw reads, SeqSero2 only uses sub-sampled reads in the k-mer raw reads workflow for rapid serotype prediction, and normally only forward reads are used for the analysis. Listing the first pair of reads as input files is to indicate that not all reads were used in the analysis.

kapsakcj commented 5 years ago

Thanks for the explanation, I was unaware that it did that!