UCSF-Costello-Lab / LG3_Pipeline

The original LG3 pipeline
https://github.com/UCSF-Costello-Lab/LG3_Pipeline
0 stars 0 forks source link

REPRODUCIBILITY: Comparing lg3 2019-07-22 to previous versions #143

Open HenrikBengtsson opened 4 years ago

HenrikBengtsson commented 4 years ago

Comparing lg3 2019-07-22 Patient157t10 results to those of lg3 2019-03-23 gives:

[cbctest2@n17 lg3-demo-2019-07-22]$ module load lg3/2019-07-22
[cbctest2@n17 lg3-demo-2019-07-22]$ LG3_TEST_TRUTH=../lg3-demo-2019-03-23/truth PATIENT=Patient157t10 lg3 test validate                                                                                                                                                                               [71/1440]
Sourced: /home/shared/cbc/software_cbc/LG3_Pipeline-2019-07-22/lg3.conf
*** Configuration
[OK] PROJECT=LG3
[OK] PATIENT=Patient157t10
[OK] CONV=patient_ID_conversions.tsv
[OK] LG3_TEST_TRUTH=../lg3-demo-2019-03-23/truth

*** Trimming of FASTQ Files
[OK] file tree ('output/LG3/trim/Z00*-trim')
[OK] file sizes ('output/LG3/trim/Z00*-trim/*')

*** BWA Alignment of FASTQ Files
[OK] file tree ('output/LG3/exomes')
[OK] file sizes ('output/LG3/exomes/Z00*/*')

*** Recalibration of BAM Files
[OK] file tree ('output/LG3/exomes_recal/Patient157t10')
[WARN] unexpected file sizes ('../lg3-demo-2019-03-23/truth/Patient157t10/output/LG3/exomes_recal/Patient157t10/*' != 'output/LG3/exomes_recal/Patient157t10/*')
@@ -11 +11 @@
-5.3K   output/LG3/exomes_recal/Patient157t10/Z00599t10.bwa.realigned.rmDups.recal.quality_distribution.pdf
+5.4K   output/LG3/exomes_recal/Patient157t10/Z00599t10.bwa.realigned.rmDups.recal.quality_distribution.pdf
@@ -25 +25 @@
-182M   output/LG3/exomes_recal/Patient157t10/Z00601t10.bwa.realigned.rmDups.recal.bam
+183M   output/LG3/exomes_recal/Patient157t10/Z00601t10.bwa.realigned.rmDups.recal.bam
@@ -35 +35 @@
-2.1M   output/LG3/exomes_recal/Patient157t10/germline
+5.2M   output/LG3/exomes_recal/Patient157t10/germline
[WARN] unexpected file sizes ('../lg3-demo-2019-03-23/truth/Patient157t10/output/LG3/exomes_recal/Patient157t10/germline/*' != 'output/LG3/exomes_recal/Patient157t10/germline/*')
@@ -1,4 +1,4 @@
-126    output/LG3/exomes_recal/Patient157t10/germline/NOR-Z00599t10_vs_Z00600t10.germline
-126    output/LG3/exomes_recal/Patient157t10/germline/NOR-Z00599t10_vs_Z00601t10.germline
-2.1M   output/LG3/exomes_recal/Patient157t10/germline/Patient157t10.UG.snps.vcf
-11K    output/LG3/exomes_recal/Patient157t10/germline/Patient157t10.UG.snps.vcf.idx
+128    output/LG3/exomes_recal/Patient157t10/germline/NOR-Z00599t10_vs_Z00600t10.germline
+128    output/LG3/exomes_recal/Patient157t10/germline/NOR-Z00599t10_vs_Z00601t10.germline
+5.0M   output/LG3/exomes_recal/Patient157t10/germline/Patient157t10.UG.snps.vcf
+217K   output/LG3/exomes_recal/Patient157t10/germline/Patient157t10.UG.snps.vcf.idx
[OK] file sizes ('output/LG3/exomes_recal/Patient157t10/*.bai')

*** Pindel Processing
[OK] file tree ('output/LG3/pindel')
[OK] file rows ('output/LG3/pindel/Patient157t10.pindel.cfg')
[OK] file sizes ('output/LG3/pindel/Patient157t10_pindel/*')

*** MutDet Processing
[OK] file tree ('output/LG3/mutations/Patient157t10_mutect')
[WARN] unexpected file sizes ('../lg3-demo-2019-03-23/truth/Patient157t10/output/LG3/mutations/Patient157t10_mutect/*' != 'output/LG3/mutations/Patient157t10_mutect/*')
@@ -1 +1 @@
-210K   output/LG3/mutations/Patient157t10_mutect/NOR-Z00599t10__REC1-Z00601t10.indels.annotated.vcf
+211K   output/LG3/mutations/Patient157t10_mutect/NOR-Z00599t10__REC1-Z00601t10.indels.annotated.vcf
@@ -3 +3 @@
-156K   output/LG3/mutations/Patient157t10_mutect/NOR-Z00599t10__REC1-Z00601t10.indels.raw.vcf
+157K   output/LG3/mutations/Patient157t10_mutect/NOR-Z00599t10__REC1-Z00601t10.indels.raw.vcf
@@ -5 +5 @@
-6.6K   output/LG3/mutations/Patient157t10_mutect/NOR-Z00599t10__REC1-Z00601t10.mutations
+6.9K   output/LG3/mutations/Patient157t10_mutect/NOR-Z00599t10__REC1-Z00601t10.mutations
@@ -7 +7 @@
-622M   output/LG3/mutations/Patient157t10_mutect/NOR-Z00599t10__REC1-Z00601t10.snvs.coverage.mutect.wig
+623M   output/LG3/mutations/Patient157t10_mutect/NOR-Z00599t10__REC1-Z00601t10.snvs.coverage.mutect.wig
@@ -9 +9 @@
-261K   output/LG3/mutations/Patient157t10_mutect/NOR-Z00599t10__TUM-Z00600t10.indels.annotated.vcf
+262K   output/LG3/mutations/Patient157t10_mutect/NOR-Z00599t10__TUM-Z00600t10.indels.annotated.vcf
@@ -11 +11 @@
-194K   output/LG3/mutations/Patient157t10_mutect/NOR-Z00599t10__TUM-Z00600t10.indels.raw.vcf
+195K   output/LG3/mutations/Patient157t10_mutect/NOR-Z00599t10__TUM-Z00600t10.indels.raw.vcf
@@ -13 +13 @@
-5.9K   output/LG3/mutations/Patient157t10_mutect/NOR-Z00599t10__TUM-Z00600t10.mutations
+5.1K   output/LG3/mutations/Patient157t10_mutect/NOR-Z00599t10__TUM-Z00600t10.mutations
@@ -15 +15 @@
-626M   output/LG3/mutations/Patient157t10_mutect/NOR-Z00599t10__TUM-Z00600t10.snvs.coverage.mutect.wig
+628M   output/LG3/mutations/Patient157t10_mutect/NOR-Z00599t10__TUM-Z00600t10.snvs.coverage.mutect.wig
@@ -17,2 +17,2 @@
-12K    output/LG3/mutations/Patient157t10_mutect/Patient157t10.NOR-Z00599t10__REC1-Z00601t10.annotated.mutations
-8.4K   output/LG3/mutations/Patient157t10_mutect/Patient157t10.NOR-Z00599t10__TUM-Z00600t10.annotated.mutations
+13K    output/LG3/mutations/Patient157t10_mutect/Patient157t10.NOR-Z00599t10__REC1-Z00601t10.annotated.mutations
+7.3K   output/LG3/mutations/Patient157t10_mutect/Patient157t10.NOR-Z00599t10__TUM-Z00600t10.annotated.mutations

*** Post-MutDet Processing
[OK] file tree ('output/LG3/MAF')
[WARN] unexpected file sizes ('../lg3-demo-2019-03-23/truth/Patient157t10/output/LG3/MAF/Patient157t10_MAF/*' != 'output/LG3/MAF/Patient157t10_MAF/*')
@@ -1,3 +1,3 @@
-78K    output/LG3/MAF/Patient157t10_MAF/Patient157t10.Normal.MAF.txt
-78K    output/LG3/MAF/Patient157t10_MAF/Patient157t10.Primary.MAF.txt
-68K    output/LG3/MAF/Patient157t10_MAF/Patient157t10.Recurrence1.MAF.txt
+123K   output/LG3/MAF/Patient157t10_MAF/Patient157t10.Normal.MAF.txt
+120K   output/LG3/MAF/Patient157t10_MAF/Patient157t10.Primary.MAF.txt
+98K    output/LG3/MAF/Patient157t10_MAF/Patient157t10.Recurrence1.MAF.txt
[WARN] unexpected file sizes ('../lg3-demo-2019-03-23/truth/Patient157t10/output/LG3/MAF/Patient157t10_plots/*' != 'output/LG3/MAF/Patient157t10_plots/*')
@@ -1,4 +1,4 @@
-24K    output/LG3/MAF/Patient157t10_plots/Patient157t10.LOH.chr17.pdf
-30K    output/LG3/MAF/Patient157t10_plots/Patient157t10.LOH.chr19.pdf
-31K    output/LG3/MAF/Patient157t10_plots/Patient157t10.LOH.grid.chr17.pdf
-38K    output/LG3/MAF/Patient157t10_plots/Patient157t10.LOH.grid.chr19.pdf
+32K    output/LG3/MAF/Patient157t10_plots/Patient157t10.LOH.chr17.pdf
+41K    output/LG3/MAF/Patient157t10_plots/Patient157t10.LOH.chr19.pdf
+41K    output/LG3/MAF/Patient157t10_plots/Patient157t10.LOH.grid.chr17.pdf
+54K    output/LG3/MAF/Patient157t10_plots/Patient157t10.LOH.grid.chr19.pdf
[OK] file tree ('output/LG3/MutInDel')
[WARN] unexpected file sizes ('../lg3-demo-2019-03-23/truth/Patient157t10/output/LG3/MutInDel/*' != 'output/LG3/MutInDel/*')
@@ -1,7 +1,7 @@
-1.5K   output/LG3/MutInDel/Patient157t10.R.mutations
-20K    output/LG3/MutInDel/Patient157t10.snvs
-3.2K   output/LG3/MutInDel/Patient157t10.snvs.anno.pat.filt.txt
-22K    output/LG3/MutInDel/Patient157t10.snvs.anno.pat.txt
-22K    output/LG3/MutInDel/Patient157t10.snvs.anno.txt
-1.4K   output/LG3/MutInDel/Patient157t10.snvs.indels.filtered.overlaps.txt
-3.4K   output/LG3/MutInDel/Patient157t10.snvs.indels.filtered.txt
+1.3K   output/LG3/MutInDel/Patient157t10.R.mutations
+19K    output/LG3/MutInDel/Patient157t10.snvs
+2.7K   output/LG3/MutInDel/Patient157t10.snvs.anno.pat.filt.txt
+21K    output/LG3/MutInDel/Patient157t10.snvs.anno.pat.txt
+21K    output/LG3/MutInDel/Patient157t10.snvs.anno.txt
+1.2K   output/LG3/MutInDel/Patient157t10.snvs.indels.filtered.overlaps.txt
+2.9K   output/LG3/MutInDel/Patient157t10.snvs.indels.filtered.txt
[WARN] unexpected file content ('output/LG3/MutInDel/Patient157t10.R.mutations')
--- ../lg3-demo-2019-03-23/truth/Patient157t10/output/LG3/MutInDel/Patient157t10.R.mutations    2020-05-18 09:48:47.000000000 -0700
+++ output/LG3/MutInDel/Patient157t10.R.mutations       2020-05-18 20:25:58.000000000 -0700
@@ -2,3 +1,0 @@
-NOS2   chr17   26089902        A       G       T2722C  Y908H   exonic  Missense        covered_in_all  MuTect  16      0       0       0       5       NO      NO      21      0       21      0       25      0       25       0       14      11      14      11      True    False   Recurrence1
-CLEC4M,CLEC4M  chr19   7830800 G       A       G338A,G491A,G422A,G491A,G419A,G407A,G491A,G428A R113Q,R164Q,R141Q,R164Q,R140Q,R136Q,R164Q,R143Q exonic;splicing Missense        covered_in_all  MuTect  24      0       00       1       NO      NO      32      1       19      0       31      4       20      3       55      4       43      3       False   True    Primary
-DOCK6  chr19   11348348        C       G       G1940C  G647A   exonic  Missense        covered_in_all  MuTect  23      0       0       0       5       NO      NO      30      0       30      0       33      0       32       0       32      5       32      4       True    False   Recurrence1
@@ -6 +2,0 @@
-ELSPBP1        chr19   48525532        G       A       G620A   W207X   exonic  Nonsense        NA      MuTect  14      0       0       0       1       NO      NO      17      0       17      0       9       4       94       8       9       8       9       True    False   Recurrence1
@@ -7,0 +4,3 @@
+ELSPBP1        chr19   48525532        G       A       G620A   W207X   exonic  Nonsense        NA      MuTect  14      0       0       0       1       NO      NO      17      0       17      0       9       4       94       8       9       8       9       True    False   Recurrence1
+DOCK6  chr19   11348348        C       G       G1940C  G647A   exonic  Missense        covered_in_all  MuTect  23      0       0       0       5       NO      NO      30      0       30      0       33      0       32       0       32      5       32      4       True    False   Recurrence1
+NOS2   chr17   26089902        A       G       T2722C  Y908H   exonic  Missense        covered_in_all  MuTect  16      0       0       0       5       NO      NO      21      0       21      0       25      0       25       0       14      11      14      11      True    False   Recurrence1
[WARN] unexpected file content ('output/LG3/MutInDel/Patient157t10.R.mutations')
--- /dev/fd/63  2020-05-23 13:40:13.050293509 -0700
+++ /dev/fd/62  2020-05-23 13:40:13.052293509 -0700
@@ -2,3 +1,0 @@
-NOS2   chr17   26089902        A       G       T2722C  Y908H   exonic  Missense        covered_in_all
-CLEC4M,CLEC4M  chr19   7830800 G       A       G338A,G491A,G422A,G491A,G419A,G407A,G491A,G428A R113Q,R164Q,R141Q,R164Q,R140Q,R136Q,R164Q,R143Q exonic;splicing Missense        covered_in_all
-DOCK6  chr19   11348348        C       G       G1940C  G647A   exonic  Missense        covered_in_all
@@ -6 +2,0 @@
-ELSPBP1        chr19   48525532        G       A       G620A   W207X   exonic  Nonsense        NA
@@ -7,0 +4,3 @@
+ELSPBP1        chr19   48525532        G       A       G620A   W207X   exonic  Nonsense        NA
+DOCK6  chr19   11348348        C       G       G1940C  G647A   exonic  Missense        covered_in_all
+NOS2   chr17   26089902        A       G       T2722C  Y908H   exonic  Missense        covered_in_all
HenrikBengtsson commented 4 years ago

Similarly to https://github.com/UCSF-Costello-Lab/LG3_Pipeline/issues/141#issuecomment-633708976, I've rerun lg3 2019-07-22 a second time. Validating the results of this using the develop version of lg3 test validate:

[cbctest2@n17 lg3-demo-2019-07-22-002]$ lg3 --version
2019-07-22 (commit 218d7fb)

gives:

[cbctest2@n17 lg3-demo-2019-07-22-002]$ LG3_TEST_TRUTH=../lg3-demo-2019-07-22/truth PATIENT=Patient157t10 $LG3_HOME/bin/lg3 test validate

Sourced: /home/henrik/repositories/UCSF-CostelloLab/LG3_Pipeline-develop/lg3.conf
*** Configuration
[OK] PROJECT=LG3
[OK] PATIENT=Patient157t10
[OK] CONV=patient_ID_conversions.tsv
[OK] LG3_TEST_TRUTH=../lg3-demo-2019-07-22/truth

*** Trimming of FASTQ Files
[OK] file tree ('output/LG3/trim/Z00*-trim')
[OK] file sizes ('output/LG3/trim/Z00*-trim/*')
[OK] file md5 checksums (after gunzip) ('output/LG3/trim/Z00*-trim/*.fastq.gz')

*** BWA Alignment of FASTQ Files
[OK] file tree ('output/LG3/exomes')
[OK] file sizes ('output/LG3/exomes/Z00*/*')
[OK] file md5 checksums ('output/LG3/exomes/Z00*/*.bai')
[OK] file md5 checksums ('output/LG3/exomes/Z00*/*.bam')
[OK] file md5 checksums ('output/LG3/exomes/Z00*/*.flagstat')

*** Recalibration of BAM Files
[OK] file tree ('output/LG3/exomes_recal/Patient157t10')
[OK] file sizes ('output/LG3/exomes_recal/Patient157t10/*')
[OK] file sizes ('output/LG3/exomes_recal/Patient157t10/germline/*')
[OK] file md5 checksums ('output/LG3/exomes_recal/Patient157t10/germline/*.germline')
[OK] file sizes ('output/LG3/exomes_recal/Patient157t10/*.bai')
[OK] file md5 checksums ('output/LG3/exomes_recal/Patient157t10/*.flagstat')
[OK] file md5 checksums ('output/LG3/exomes_recal/Patient157t10/*.bai')
[OK] file md5 checksums ('output/LG3/exomes_recal/Patient157t10/*.bam')

*** Pindel Processing
[OK] file tree ('output/LG3/pindel')
[OK] file rows ('output/LG3/pindel/Patient157t10.pindel.cfg')
[OK] file sizes ('output/LG3/pindel/Patient157t10_pindel/*')
[OK] file md5 checksums ('output/LG3/pindel/Patient157t10_pindel/*')

*** MutDet Processing
[OK] file tree ('output/LG3/mutations/Patient157t10_mutect')
[WARN] unexpected file sizes ('../lg3-demo-2019-07-22/truth/Patient157t10/output/LG3/mutations/Patient157t10_mutect/*' != 'output/LG3/mutations/Patient157t10_mutect/*')
@@ -1 +1 @@
-211K   output/LG3/mutations/Patient157t10_mutect/NOR-Z00599t10__REC1-Z00601t10.indels.annotated.vcf
+212K   output/LG3/mutations/Patient157t10_mutect/NOR-Z00599t10__REC1-Z00601t10.indels.annotated.vcf
@@ -6,2 +6,2 @@
-70M    output/LG3/mutations/Patient157t10_mutect/NOR-Z00599t10__REC1-Z00601t10.snvs.coverage.mutect.bed
-623M   output/LG3/mutations/Patient157t10_mutect/NOR-Z00599t10__REC1-Z00601t10.snvs.coverage.mutect.wig
+63M    output/LG3/mutations/Patient157t10_mutect/NOR-Z00599t10__REC1-Z00601t10.snvs.coverage.mutect.bed
+528M   output/LG3/mutations/Patient157t10_mutect/NOR-Z00599t10__REC1-Z00601t10.snvs.coverage.mutect.wig
@@ -14,2 +14,2 @@
-88M    output/LG3/mutations/Patient157t10_mutect/NOR-Z00599t10__TUM-Z00600t10.snvs.coverage.mutect.bed
-628M   output/LG3/mutations/Patient157t10_mutect/NOR-Z00599t10__TUM-Z00600t10.snvs.coverage.mutect.wig
+84M    output/LG3/mutations/Patient157t10_mutect/NOR-Z00599t10__TUM-Z00600t10.snvs.coverage.mutect.bed
+597M   output/LG3/mutations/Patient157t10_mutect/NOR-Z00599t10__TUM-Z00600t10.snvs.coverage.mutect.wig
[OK] file md5 checksums ('output/LG3/mutations/Patient157t10_mutect/*.mutations')
[OK] file md5 checksums ('output/LG3/mutations/Patient157t10_mutect/*.txt')
[OK] file md5 checksums ('output/LG3/mutations/Patient157t10_mutect/*.intersect.bed')

*** Post-MutDet Processing
[OK] file tree ('output/LG3/MAF')
[OK] file sizes ('output/LG3/MAF/Patient157t10_MAF/*')
[OK] file sizes ('output/LG3/MAF/Patient157t10_plots/*')
[OK] file tree ('output/LG3/MutInDel')
[OK] file sizes ('output/LG3/MutInDel/*')
[OK] file content ('output/LG3/MutInDel/Patient157t10.R.mutations')

As in https://github.com/UCSF-Costello-Lab/LG3_Pipeline/issues/141#issuecomment-633708976, this suggests "... that (i) there is random component to the ./_run_MutDet step, and (ii) the validation of the 'MutDet Processing' step does not take this into account ..."