Closed erinyoung closed 4 months ago
When comparing with and without bbnorm:
==> cecret/samtools_coverage/3528365.cov.hist <==
MN908947.3 (29.9Kbp)
> 90.00% │▁███████████████████████████████████▇██████████▇█ │ Number of reads: 199978
> 80.00% │██████████████████████████████████████████████████│
> 70.00% │██████████████████████████████████████████████████│ Covered bases: 29.8Kbp
> 60.00% │██████████████████████████████████████████████████│ Percent covered: 99.59%
> 50.00% │██████████████████████████████████████████████████│ Mean coverage: 862x
> 40.00% │██████████████████████████████████████████████████│ Mean baseQ: 32.8
> 30.00% │██████████████████████████████████████████████████│ Mean mapQ: 60
> 20.00% │██████████████████████████████████████████████████│
> 10.00% │██████████████████████████████████████████████████│ Histo bin width: 598bp
> 0.00% │██████████████████████████████████████████████████│ Histo max bin: 100%
1 6.0K 12.0K 17.9K 23.9K 29.9K
==> cecret/samtools_coverage/3540826-UT-A01290-240207.cov.hist <==
MN908947.3 (29.9Kbp)
> 90.00% │▇███████████████████████████████████▇██████████▇█▁│ Number of reads: 71398
> 80.00% │██████████████████████████████████████████████████│
> 70.00% │██████████████████████████████████████████████████│ Covered bases: 29.8Kbp
> 60.00% │██████████████████████████████████████████████████│ Percent covered: 99.72%
> 50.00% │██████████████████████████████████████████████████│ Mean coverage: 325x
> 40.00% │██████████████████████████████████████████████████│ Mean baseQ: 36.1
> 30.00% │██████████████████████████████████████████████████│ Mean mapQ: 60
> 20.00% │██████████████████████████████████████████████████│
> 10.00% │██████████████████████████████████████████████████│ Histo bin width: 598bp
> 0.00% │██████████████████████████████████████████████████│ Histo max bin: 100%
1 6.0K 12.0K 17.9K 23.9K 29.9K
==> nonorm/samtools_coverage/3528365.cov.hist <==
MN908947.3 (29.9Kbp)
> 90.00% │▇███████████████████████████████████▇██████████▇█▁│ Number of reads: 565714
> 80.00% │██████████████████████████████████████████████████│
> 70.00% │██████████████████████████████████████████████████│ Covered bases: 29.8Kbp
> 60.00% │██████████████████████████████████████████████████│ Percent covered: 99.73%
> 50.00% │██████████████████████████████████████████████████│ Mean coverage: 2.44e+03x
> 40.00% │██████████████████████████████████████████████████│ Mean baseQ: 32.7
> 30.00% │██████████████████████████████████████████████████│ Mean mapQ: 60
> 20.00% │██████████████████████████████████████████████████│
> 10.00% │██████████████████████████████████████████████████│ Histo bin width: 598bp
> 0.00% │██████████████████████████████████████████████████│ Histo max bin: 100%
1 6.0K 12.0K 17.9K 23.9K 29.9K
==> nonorm/samtools_coverage/3540826-UT-A01290-240207.cov.hist <==
MN908947.3 (29.9Kbp)
> 90.00% │████████████████████████████████████▇████████████▁│ Number of reads: 22334310
> 80.00% │██████████████████████████████████████████████████│
> 70.00% │██████████████████████████████████████████████████│ Covered bases: 29.8Kbp
> 60.00% │██████████████████████████████████████████████████│ Percent covered: 99.79%
> 50.00% │██████████████████████████████████████████████████│ Mean coverage: 1.02e+05x
> 40.00% │██████████████████████████████████████████████████│ Mean baseQ: 36.2
> 30.00% │██████████████████████████████████████████████████│ Mean mapQ: 60
> 20.00% │██████████████████████████████████████████████████│
> 10.00% │██████████████████████████████████████████████████│ Histo bin width: 598bp
> 0.00% │██████████████████████████████████████████████████│ Histo max bin: 100%
1 6.0K 12.0K 17.9K 23.9K 29.9K
And the final summary file
==> cecret/cecret_results.txt <==
sample_id sample pangolin_lineage nextclade_clade vadr_p/f fasta_line fastqc_raw_reads_1 fastqc_raw_reads_2 num_N num_total seqyclean_PairsKept seqyclean_Perc_Kept num_pos_100X insert_size_after_trimming bcftools_variants_identified samtools_meandepth_after_trimming samtools_per_1X_coverage_after_trimming vadr_model vadr_alerts nextclade_clade_who nextclade_qc_overallscore nextclade_qc_overallstatus pangolin_conflict pangolin_ambiguity_score pangolin_scorpio_call pangolin_scorpio_support pangolin_scorpio_conflict pangolin_scorpio_notes pangolin_version pangolin_pangolin_version pangolin_scorpio_version pangolin_constellation_version pangolin_is_designated pangolin_qc_status pangolin_qc_notes pangolin_note pangocollapse_lineage pangocollapse_Lineage_full pangocollapse_Lineage_expanded pangocollapse_Lineage_family freyja_summarized Cecret version seqyclean bwa ivar ivar consensus
3528365 3528365 XCR recombinant PASS 3528365 325979.0 325979.0 718 29759 109728.0 97.3586 29040 171.0 122 861.881 99.5887 NC_045512 - recombinant 3.396763 good 0.0 Omicron (XBB.1.5-like) 0.94 0.01 scorpio call: Alt alleles 82; Ref alleles 1; Amb alleles 1; Oth alleles 3 PUSHER-v1.25.1 4.3.1 0.3.19 v0.1.12 False pass Ambiguous content: 4% Usher placements: XCR(1/1); scorpio lineage XBB.1.5 conflicts with inference lineage XCR XCR XCR XCR Recombinant [('Other' 0.9999999999996719)] v3.12.20240221 seqyclean : Version: 1.10.09 (2018-10-16) bwa : Version: 0.7.17-r1188 ivar : iVar version 1.4.2 iVar version 1.4.2
3540826-UT-A01290-240207 3540826-UT-A01290-240207 JN.1.1 23I PASS 3540826-UT-A01290-240207 12181621.0 12181621.0 111 29796 48071.0 87.1799 29685 187.6 132 325.301 99.7224 NC_045512 - Omicron 0.0 good 0.0 Omicron (BA.2-like) 0.92 0.03 scorpio call: Alt alleles 57; Ref alleles 2; Amb alleles 0; Oth alleles 3 PUSHER-v1.25.1 4.3.1 0.3.19 v0.1.12 False pass Ambiguous content: 2% Usher placements: JN.1.1(1/1) JN.1.1 B.1.1.529.2.86.1.1.1 B.1.1.529:BA.2.86.1:JN.1.1 BA.2 [('BA.2.86* (BA.2.86X)' 0.999999999994306)] v3.12.20240221 seqyclean : Version: 1.10.09 (2018-10-16) bwa : Version: 0.7.17-r1188 ivar : iVar version 1.4.2 iVar version 1.4.2
bbnorm_test bbnorm JN.1.1 23I PASS bbnorm_test 55140.0 55140.0 111 29796 44283.0 87.4588 29685 188.2 132 304.238 99.7224 NC_045512 - Omicron 0.0 good 0.0 Omicron (BA.2-like) 0.92 0.03 scorpio call: Alt alleles 57; Ref alleles 2; Amb alleles 0; Oth alleles 3 PUSHER-v1.25.1 4.3.1 0.3.19 v0.1.12 False pass Ambiguous content: 2% Usher placements: JN.1.1(1/1) JN.1.1 B.1.1.529.2.86.1.1.1 B.1.1.529:BA.2.86.1:JN.1.1 BA.2 [('BA.2.86* (BA.2.86X)' 0.9999999999906976)] v3.12.20240221 seqyclean : Version: 1.10.09 (2018-10-16) bwa : Version: 0.7.17-r1188 ivar : iVar version 1.4.2 iVar version 1.4.2
==> nonorm/cecret_results.txt <==
sample_id sample pangolin_lineage nextclade_clade vadr_p/f fasta_line fastqc_raw_reads_1 fastqc_raw_reads_2 num_N num_total seqyclean_PairsKept seqyclean_Perc_Kept num_pos_100X insert_size_after_trimming bcftools_variants_identified samtools_meandepth_after_trimming samtools_per_1X_coverage_after_trimming vadr_model vadr_alerts nextclade_clade_who nextclade_qc_overallscore nextclade_qc_overallstatus pangolin_conflict pangolin_ambiguity_score pangolin_scorpio_call pangolin_scorpio_support pangolin_scorpio_conflict pangolin_scorpio_notes pangolin_version pangolin_pangolin_version pangolin_scorpio_version pangolin_constellation_version pangolin_is_designated pangolin_qc_status pangolin_qc_notes pangolin_note pangocollapse_lineage pangocollapse_Lineage_full pangocollapse_Lineage_expanded pangocollapse_Lineage_family freyja_summarized Cecret version seqyclean bwa ivar ivar consensus
3528365 3528365 XCR recombinant PASS 3528365 325979.0 325979.0 758 29801 316176.0 96.9928 29043 170.8 125 2442.1 99.7325 NC_045512 - recombinant 3.877421 good 0.0 Omicron (XBB.1.5-like) 0.94 0.01 scorpio call: Alt alleles 82; Ref alleles 1; Amb alleles 1; Oth alleles 3 PUSHER-v1.25.1 4.3.1 0.3.19 v0.1.12 False pass Ambiguous content: 4% Usher placements: XCR(1/1); scorpio lineage XBB.1.5 conflicts with inference lineage XCR XCR XCR XCR Recombinant [('Other' 0.9999999999965108)] v3.12.20240221 seqyclean : Version: 1.10.09 (2018-10-16) bwa : Version: 0.7.17-r1188 ivar : iVar version 1.4.2 iVar version 1.4.2
3540826-UT-A01290-240207 3540826-UT-A01290-240207 JN.1.1 23I PASS 3540826-UT-A01290-240207 12181621.0 12181621.0 14 29805 11326208.0 92.9778 29815 186.5 137 101836.0 99.7927 NC_045512 - Omicron 0.0 good 0.0 Omicron (BA.2-like) 0.92 0.03 scorpio call: Alt alleles 57; Ref alleles 2; Amb alleles 0; Oth alleles 3PUSHER-v1.25.1 4.3.1 0.3.19 v0.1.12 False pass Ambiguous content: 2% Usher placements: JN.1.1(1/1) JN.1.1 B.1.1.529.2.86.1.1.1 B.1.1.529:BA.2.86.1:JN.1.1 BA.2 [('BA.2.86* (BA.2.86X)' 0.9977042473353005)] v3.12.20240221 seqyclean : Version: 1.10.09 (2018-10-16) bwa : Version: 0.7.17-r1188 ivar : iVar version 1.4.2 iVar version 1.4.2
bbnorm_test bbnorm JN.1.1 23I PASS bbnorm_test 55140.0 55140.0 111 29796 48071.0 87.1799 29685 187.6 132 325.301 99.7224 NC_045512 - Omicron 0.0 good 0.0 Omicron (BA.2-like) 0.92 0.03 scorpio call: Alt alleles 57; Ref alleles 2; Amb alleles 0; Oth alleles 3 PUSHER-v1.25.1 4.3.1 0.3.19 v0.1.12 False pass Ambiguous content: 2% Usher placements: JN.1.1(1/1) JN.1.1 B.1.1.529.2.86.1.1.1 B.1.1.529:BA.2.86.1:JN.1.1 BA.2 [('BA.2.86* (BA.2.86X)' 0.999999999994306)] v3.12.20240221 seqyclean : Version: 1.10.09 (2018-10-16) bwa : Version: 0.7.17-r1188 ivar : iVar version 1.4.2 iVar version 1.4.2
Notably, normalization should not be used on wastewater or mixed samples.
In general, bbnorm appears to slightly increase the number of "N"s in the sequence (14 -> 111 for 3540826), which reduces the number of variants observed (137 -> 135 for 3540826). It does not seem to impact Freyja or Pangolin overall results, but there may be key variants that end up missing.
It DOES speed up runtime. By... a lot for samples with a lot of reads.
These three samples without normalization : 1 h 44 m 5 s These three samples with normalization : 21 m 7 s