UCSF-Costello-Lab / LG3_Pipeline

The original LG3 pipeline
https://github.com/UCSF-Costello-Lab/LG3_Pipeline
0 stars 0 forks source link

muTect: Try to switch from muTect-1.0.27783 to a newer version, e.g. muTect-1.1.1 or muTect-1.1.4 #151

Open HenrikBengtsson opened 3 years ago

HenrikBengtsson commented 3 years ago

This will make it possible to get 100% reproducible runs by using CLI option --disableRandomization (https://github.com/UCSF-Costello-Lab/LG3_Pipeline/issues/141#issuecomment-634226939). With 100% reproducible runs, we can move forward and replacing other things. If we break something, our reproducibility tests will catch it.

ivan108 commented 3 years ago

I tried to replace the original muTect-1.0.27783.jar with muTect-1.1.4.jar. New muTect itself seems to be working, but getting an error downstream on filtering step FilterMutations/Filter.py, which calls MuTector:

Warning: MuTector version (## muTector v1.0.47986) not what we are expecting (## muTector v1.0.27200)...
MuTectorc olumns not the expected columns.
#Col  Actual   Expected
Traceback (most recent call last):
  File "/c4/home/jocostello/repos/LG3_Pipeline/FilterMutations/Filter.py", line 254, in <module>
    sys.exit(main())
  File "/c4/home/jocostello/repos/LG3_Pipeline/FilterMutations/Filter.py", line 82, in main
    filterPointMutations(pointMutFn, mutations)
  File "/c4/home/jocostello/repos/LG3_Pipeline/FilterMutations/Filter.py", line 135, in filterPointMutations
    numCols = validateMutectorFile(version, rawHeader.replace('\t', ' '))
  File "/c4/home/jocostello/repos/LG3_Pipeline/FilterMutations/MuTector.py", line 165, in validateMutectorFile
    for i in xrange(max(actualLen, NumMutectorColumns)):
NameError: global name 'NumMutectorColumns' is not defined
HenrikBengtsson commented 3 years ago

Thanks. Unfortunately, attempting to replace muTect v1.0.27783 with muTect v1.1.1 (sic!) also failed with the same error:

$ cat _MutDet_Z00601t10.out
Sourced: /cbc2/data2/henrik/repositories/UCSF-CostelloLab/LG3_Pipeline-next-release/lg3.conf
Sourced: /cbc2/data2/henrik/repositories/UCSF-CostelloLab/test-next-release/lg3.conf (63 bytes)
[2021-10-05 19:03:30 PDT] BEGIN: /var/spool/torque/mom_priv/jobs/2064182.cclc01.som.ucsf.edu.SC
Call: /var/spool/torque/mom_priv/jobs/2064182.cclc01.som.ucsf.edu.SC
Script: /var/spool/torque/mom_priv/jobs/2064182.cclc01.som.ucsf.edu.SC
Arguments: 
Settings:
- LG3_HOME=/cbc2/data2/henrik/repositories/UCSF-CostelloLab/LG3_Pipeline-next-release
- LG3_INPUT_ROOT=output
- LG3_OUTPUT_ROOT=output
- EMAIL=henrik.bengtsson-gmail@fwd.braju.com
- PROJECT=LG3
- LG3_SCRATCH_ROOT=/scratch/henrik/2064182.cclc01.som.ucsf.edu
- PWD=/cbc2/data2/henrik/repositories/UCSF-CostelloLab/test-next-release
- USER=henrik
- PBS_NUM_PPN=4
- hostname=n27
Input:
- PATIENT=Patient157t10
- TUMOR=Z00601t10
- NORMAL=Z00599t10
- TYPE=REC1
- CONFIG=/cbc2/data2/henrik/repositories/UCSF-CostelloLab/LG3_Pipeline-next-release/FilterMutations/mutationConfig.cfg
- INTERVAL=/cbc2/data2/henrik/repositories/UCSF-CostelloLab/LG3_Pipeline-next-release/resources/All_exome_targets.extended_200bp.interval_list
- XMX=Xmx8g
- WORKDIR=/cbc2/data2/henrik/repositories/UCSF-CostelloLab/test-next-release/output/LG3/mutations/Patient157t10_mutect
New working directory: '/scratch/henrik/2064182.cclc01.som.ucsf.edu/Patient157t10_mutect' (was '/cbc2/data2/henrik/repositories/UCSF-CostelloLab/test-next-release')
Starting MutDet job on Tue Oct  5 19:03:30 PDT 2021
Patient = Patient157t10
Normal = /cbc2/data2/henrik/repositories/UCSF-CostelloLab/test-next-release/output/LG3/exomes_recal/Patient157t10/Z00599t10.bwa.realigned.rmDups.recal.bam
Tumor = /cbc2/data2/henrik/repositories/UCSF-CostelloLab/test-next-release/output/LG3/exomes_recal/Patient157t10/Z00601t10.bwa.realigned.rmDups.recal.bam
Tum. Type = REC1
Config = /cbc2/data2/henrik/repositories/UCSF-CostelloLab/LG3_Pipeline-next-release/FilterMutations/mutationConfig.cfg
Interval = /home/jocostello/shared/LG3_Pipeline_HIDE/resources/All_exome_targets.extended_200bp.interval_list
Java Memory = Xmx8g
WORKDIR=/cbc2/data2/henrik/repositories/UCSF-CostelloLab/test-next-release/output/LG3/mutations/Patient157t10_mutect
SCRATCH=/scratch/henrik/2064182.cclc01.som.ucsf.edu/Patient157t10_mutect
Sourced: /cbc2/data2/henrik/repositories/UCSF-CostelloLab/LG3_Pipeline-next-release/lg3.conf
[2021-10-05 19:03:30 PDT] BEGIN: /cbc2/data2/henrik/repositories/UCSF-CostelloLab/LG3_Pipeline-next-release/scripts/MutDet.sh
Call: /cbc2/data2/henrik/repositories/UCSF-CostelloLab/LG3_Pipeline-next-release/scripts/MutDet.sh
Script: /cbc2/data2/henrik/repositories/UCSF-CostelloLab/LG3_Pipeline-next-release/scripts/MutDet.sh
Arguments: /cbc2/data2/henrik/repositories/UCSF-CostelloLab/test-next-release/output/LG3/exomes_recal/Patient157t10/Z00599t10.bwa.realigned.rmDups.recal.bam /cbc2/data2/henrik/repositories/UCSF-CostelloLab/test-next-release/output/LG3/exomes_recal/Patient157t10/Z00601t10.bwa.realigned.rmDups.recal.bam NOR-Z00599t10__REC1-Z00601t10 Patient157t10 /cbc2/data2/henrik/repositories/UCSF-CostelloLab/LG3_Pipeline-next-release/FilterMutations/mutationConfig.cfg /home/jocostello/shared/LG3_Pipeline_HIDE/resources/All_exome_targets.extended_200bp.interval_list Xmx8g
Settings:
- LG3_HOME=/cbc2/data2/henrik/repositories/UCSF-CostelloLab/LG3_Pipeline-next-release
- LG3_OUTPUT_ROOT=output
- LG3_SCRATCH_ROOT=/scratch/henrik/2064182.cclc01.som.ucsf.edu
- PWD=/scratch/henrik/2064182.cclc01.som.ucsf.edu/Patient157t10_mutect
- USER=henrik
- PBS_NUM_PPN=4
- hostname=n27
- ncores=4
Input:
- nbamfile=/cbc2/data2/henrik/repositories/UCSF-CostelloLab/test-next-release/output/LG3/exomes_recal/Patient157t10/Z00599t10.bwa.realigned.rmDups.recal.bam
- tbamfile=/cbc2/data2/henrik/repositories/UCSF-CostelloLab/test-next-release/output/LG3/exomes_recal/Patient157t10/Z00601t10.bwa.realigned.rmDups.recal.bam
- prefix=NOR-Z00599t10__REC1-Z00601t10
- patientID=Patient157t10
- CONFIG=/cbc2/data2/henrik/repositories/UCSF-CostelloLab/LG3_Pipeline-next-release/FilterMutations/mutationConfig.cfg
- ILIST=/home/jocostello/shared/LG3_Pipeline_HIDE/resources/All_exome_targets.extended_200bp.interval_list
- XMX=Xmx8g
Software:
- JAVA=/cbc2/data2/henrik/repositories/UCSF-CostelloLab/LG3_Pipeline-next-release/tools/java/jre1.6.0_27/bin/java
- PYTHON=/usr/bin/python
- MUTECT=/home/shared/cbc/software_cbc/mutect-1.1.1/muTect-1.1.1.jar
- FILTER=/cbc2/data2/henrik/repositories/UCSF-CostelloLab/LG3_Pipeline-next-release/FilterMutations/Filter.py
- REORDER=/cbc2/data2/henrik/repositories/UCSF-CostelloLab/LG3_Pipeline-next-release/scripts/vcf_reorder.py
References:
- REF=/cbc2/data2/henrik/repositories/UCSF-CostelloLab/LG3_Pipeline-next-release/resources/UCSC_HG19_Feb_2009/hg19.fa
- DBSNP=/cbc2/data2/henrik/repositories/UCSF-CostelloLab/LG3_Pipeline-next-release/resources/dbsnp_132.hg19.sorted.vcf
- REORDER=/cbc2/data2/henrik/repositories/UCSF-CostelloLab/LG3_Pipeline-next-release/scripts/vcf_reorder.py
- CONVERT=/cbc2/data2/henrik/repositories/UCSF-CostelloLab/LG3_Pipeline-next-release/resources/RefSeq.Entrez.txt
- KINASEDATA=/cbc2/data2/henrik/repositories/UCSF-CostelloLab/LG3_Pipeline-next-release/resources/all_human_kinases.txt
- COSMICDATA=/cbc2/data2/henrik/repositories/UCSF-CostelloLab/LG3_Pipeline-next-release/resources/CosmicMutantExport_v58_150312.tsv
- CANCERDATA=/cbc2/data2/henrik/repositories/UCSF-CostelloLab/LG3_Pipeline-next-release/resources/SangerCancerGeneCensus_2012-03-15.txt
-------------------------------------------------
[MutDet] Mutation detection Tue Oct  5 19:03:30 PDT 2021
-------------------------------------------------
[MutDet] Patient ID: Patient157t10
[MutDet] Normal Sample: Z00599t10
[MutDet] Tumor Sample: Z00601t10
[MutDet] Prefix: NOR-Z00599t10__REC1-Z00601t10
-------------------------------------------------
[MutDet] Normal bam file: /cbc2/data2/henrik/repositories/UCSF-CostelloLab/test-next-release/output/LG3/exomes_recal/Patient157t10/Z00599t10.bwa.realigned.rmDups.recal.bam
[MutDet] Tumor bam file: /cbc2/data2/henrik/repositories/UCSF-CostelloLab/test-next-release/output/LG3/exomes_recal/Patient157t10/Z00601t10.bwa.realigned.rmDups.recal.bam
[MutDet] Java Memory Xmx value: Xmx8g
[MutDet] Working directory: /scratch/henrik/2064182.cclc01.som.ucsf.edu/Patient157t10_mutect
-------------------------------------------------
[MutDet] Running muTect...
WARN  19:39:16,450 RestStorageService - Error Response: PUT '/GATK_Run_Reports/lDhttZCwmz5Ruf4shIpKQIfUdQ8uEHja.report.xml.gz' -- ResponseCode: 403, ResponseStatus: Forbidden, Request Headers: [Content-Length: 419, Content-MD5: rnYUhg6epuz4I4rUF7Wj1w==, Content-Type: application/octet-stream, x-amz-meta-md5-hash: ae7614860e9ea6ecf8238ad417b5a3d7, Date: Wed, 06 Oct 2021 02:39:15 GMT, Authorization: AWS AKIAJXU7VIHBPDW4TDSQ:ivz9hbrp8Dt2DTF4b1oUiuR1j0I=, User-Agent: JetS3t/0.8.1 (Linux/2.6.32-504.12.2.el6.664g0000.x86_64; amd64; en; JVM 1.6.0_27), Host: s3.amazonaws.com, Expect: 100-continue], Response Headers: [x-amz-request-id: 8AS8NRNHAEQGE5ZY, x-amz-id-2: 1Vf6WrvfAYoGn/p0GPVxqVSrNF38/m4IPGH1b7Gw2MJmG1qbliI2TohWu1gYkSlwXEwVhRXPu7A=, Content-Type: application/xml, Transfer-Encoding: chunked, Date: Wed, 06 Oct 2021 02:39:15 GMT, Server: AmazonS3, Connection: close] 

real    35m45.976s
user    140m35.359s
sys     1m52.980s
Done
15200 NOR-Z00599t10__REC1-Z00601t10.snvs.raw.mutect.txt
[MutDet] Running Somatic Indel Detector...
INFO  19:39:18,393 HelpFormatter - -------------------------------------------------------------------------------- 
INFO  19:39:18,395 HelpFormatter - The Genome Analysis Toolkit (GATK) v1.6-5-g557da77, Compiled 2012/05/03 17:30:26 
INFO  19:39:18,395 HelpFormatter - Copyright (c) 2010 The Broad Institute 
INFO  19:39:18,396 HelpFormatter - Please view our documentation at http://www.broadinstitute.org/gsa/wiki 
INFO  19:39:18,396 HelpFormatter - For support, please view our support site at http://getsatisfaction.com/gsa 
INFO  19:39:18,396 HelpFormatter - Program Args: --analysis_type SomaticIndelDetector -I:normal /cbc2/data2/henrik/repositories/UCSF-CostelloLab/test-next-release/output/LG3/exomes_recal/Patient157t10/Z00599t10.bwa.realigned.rmDups.recal.bam -I:tumor /cbc2/data2/henrik/repositories/UCSF-CostelloLab/test-next-release/output/LG3/exomes_recal/Patient157t10/Z00601t10.bwa.realigned.rmDups.recal.bam --logging_level INFO --reference_sequence /cbc2/data2/henrik/repositories/UCSF-CostelloLab/LG3_Pipeline-next-release/resources/UCSC_HG19_Feb_2009/hg19.fa --intervals /home/jocostello/shared/LG3_Pipeline_HIDE/resources/All_exome_targets.extended_200bp.interval_list -baq CALCULATE_AS_NECESSARY --maxNumberOfReads 10000 --window_size 350 --filter_expressions N_COV<8||T_COV<14||T_INDEL_F<0.1||T_INDEL_CF<0.7 --out NOR-Z00599t10__REC1-Z00601t10.indels.raw.vcf 
INFO  19:39:18,397 HelpFormatter - Date/Time: 2021/10/05 19:39:18 
INFO  19:39:18,397 HelpFormatter - -------------------------------------------------------------------------------- 
INFO  19:39:18,397 HelpFormatter - -------------------------------------------------------------------------------- 
INFO  19:39:18,418 GenomeAnalysisEngine - Strictness is SILENT 
INFO  19:39:18,471 SAMDataSource$SAMReaders - Initializing SAMRecords in serial 
INFO  19:39:18,495 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.02 
INFO  19:39:19,809 SomaticIndelDetectorWalker - No gene annotations available 
INFO  19:39:23,893 TraversalEngine - [INITIALIZATION COMPLETE; TRAVERSAL STARTING] 
INFO  19:39:23,893 TraversalEngine -        Location processed.reads  runtime per.1M.reads completed total.runtime remaining 
INFO  19:40:40,874 TraversalEngine -     chr17:97210        1.00e+03   80.2 s       22.3 h     78.2%       102.6 s    22.4 s 
INFO  19:41:10,934 TraversalEngine -   chr17:7839985        1.81e+05  110.3 s       10.2 m     79.0%         2.3 m    29.4 s 
INFO  19:41:41,227 TraversalEngine -  chr17:18855823        3.31e+05    2.3 m        7.1 m     79.6%         2.9 m    36.1 s 
INFO  19:42:11,274 TraversalEngine -  chr17:34106174        4.97e+05    2.8 m        5.7 m     80.3%         3.5 m    41.9 s 
INFO  19:42:41,289 TraversalEngine -  chr17:42154106        6.81e+05    3.3 m        4.9 m     81.2%         4.1 m    46.6 s 
INFO  19:43:11,484 TraversalEngine -  chr17:56233486        8.48e+05    3.8 m        4.5 m     81.9%         4.7 m    51.1 s 
INFO  19:43:41,511 TraversalEngine -  chr17:66925656        1.00e+06    4.3 m        4.3 m     82.5%         5.3 m    55.2 s 
INFO  19:44:11,538 TraversalEngine -  chr17:79655939        1.20e+06    4.8 m        4.0 m     83.4%         5.8 m    58.0 s 
INFO  19:44:41,540 TraversalEngine -   chr19:5668357        1.45e+06    5.3 m        3.7 m     85.7%         6.2 m    53.3 s 
INFO  19:45:11,572 TraversalEngine -  chr19:13925364        1.71e+06    5.8 m        3.4 m     86.7%         6.7 m    54.1 s 
INFO  19:45:41,796 TraversalEngine -  chr19:23578160        1.89e+06    6.4 m        3.4 m     87.4%         7.3 m    54.8 s 
INFO  19:46:11,865 TraversalEngine -  chr19:40886445        2.07e+06    6.9 m        3.3 m     88.2%         7.8 m    54.9 s 
INFO  19:46:41,896 TraversalEngine -  chr19:49006234        2.25e+06    7.4 m        3.3 m     89.0%         8.3 m    54.3 s 
INFO  19:47:12,028 TraversalEngine -  chr19:56249473        2.46e+06    7.9 m        3.2 m     89.9%         8.7 m    52.7 s 
INFO  19:47:26,003 Walker - [REDUCE RESULT] Traversal result is: 2511833 
INFO  19:47:26,003 TraversalEngine - Total runtime 485.36 secs, 8.09 min, 0.13 hours 
INFO  19:47:26,055 TraversalEngine - 116043 reads were filtered out during traversal out of 2627951 total (4.42%) 
INFO  19:47:26,055 TraversalEngine -   -> 116043 reads (4.42% of total) failing MappingQualityZeroFilter 
WARN  19:47:27,078 RestStorageService - Error Response: PUT '/GATK_Run_Reports/OghmfKohj3Xndyb8pmIl1LEIynZTUkJd.report.xml.gz' -- ResponseCode: 403, ResponseStatus: Forbidden, Request Headers: [Content-Length: 334, Content-MD5: j/moAkl4naH5ZEOaaXl5gQ==, Content-Type: application/octet-stream, x-amz-meta-md5-hash: 8ff9a80249789da1f964439a69797981, Date: Wed, 06 Oct 2021 02:47:26 GMT, Authorization: AWS AKIAJXU7VIHBPDW4TDSQ:lc7M3ga8zwS/nPqL/6Egx2dTjKo=, User-Agent: JetS3t/0.8.1 (Linux/2.6.32-504.12.2.el6.664g0000.x86_64; amd64; en; JVM 1.6.0_27), Host: s3.amazonaws.com, Expect: 100-continue], Response Headers: [x-amz-request-id: A9DPSXZPEV4G3RN1, x-amz-id-2: nAE+0r6Xkyw1WsgNqRyc1AYIM+gxQ+OmSrC19b3vZq9CQyO0DdYzNAd5woqh2eX6c+FmDoUa+Js=, Content-Type: application/xml, Transfer-Encoding: chunked, Date: Wed, 06 Oct 2021 02:47:26 GMT, Server: AmazonS3, Connection: close] 

real    8m10.609s
user    9m33.513s
sys     0m7.670s
Done
667 NOR-Z00599t10__REC1-Z00601t10.indels.raw.vcf
[MutDet] Annotating raw indel calls...
INFO  19:47:28,968 RodBindingArgumentTypeDescriptor - Dynamically determined type of NOR-Z00599t10__REC1-Z00601t10.indels.raw.vcf to be VCF 
WARN  19:47:36,850 RestStorageService - Error Response: PUT '/GATK_Run_Reports/mm8JLMSL491v4MQnd4t2xJ4G7UjxdCVl.report.xml.gz' -- ResponseCode: 403, ResponseStatus: Forbidden, Request Headers: [Content-Length: 323, Content-MD5: oOoD53mdt1wjLP0PA+bzMg==, Content-Type: application/octet-stream, x-amz-meta-md5-hash: a0ea03e7799db75c232cfd0f03e6f332, Date: Wed, 06 Oct 2021 02:47:36 GMT, Authorization: AWS AKIAJXU7VIHBPDW4TDSQ:UrricL2kbsxwy5JbnHBSN1fSow0=, User-Agent: JetS3t/0.8.1 (Linux/2.6.32-504.12.2.el6.664g0000.x86_64; amd64; en; JVM 1.6.0_27), Host: s3.amazonaws.com, Expect: 100-continue], Response Headers: [x-amz-request-id: 5DBJVMRAC93EV3BA, x-amz-id-2: PvQn/mNQk0/5+SjBtgC5uDA/IWs/wgeZOXBgJFuJ6Yi3rTvk4yA20i3OMuvlchxrnlQuLNoq8no=, Content-Type: application/xml, Transfer-Encoding: chunked, Date: Wed, 06 Oct 2021 02:47:36 GMT, Server: AmazonS3, Connection: close] 

real    0m9.761s
user    0m17.581s
sys     0m2.042s
Done
684 NOR-Z00599t10__REC1-Z00601t10.indels.annotated.vcf
[MutDet] Reordering indel vcf...

real    0m0.321s
user    0m0.049s
sys     0m0.028s
Done
684 NOR-Z00599t10__REC1-Z00601t10.indels.annotated.temp.vcf
[MutDet] Filtering mutect and indel output...
Warning: MuTector version (## muTector v1.0.44829) not what we are expecting (## muTector v1.0.27200)...
MuTectorc olumns not the expected columns.
#Col    Actual  Expected
Traceback (most recent call last):
  File "/cbc2/data2/henrik/repositories/UCSF-CostelloLab/LG3_Pipeline-next-release/FilterMutations/Filter.py", line 254, in <module>
    sys.exit(main())
  File "/cbc2/data2/henrik/repositories/UCSF-CostelloLab/LG3_Pipeline-next-release/FilterMutations/Filter.py", line 82, in main
    filterPointMutations(pointMutFn, mutations)
  File "/cbc2/data2/henrik/repositories/UCSF-CostelloLab/LG3_Pipeline-next-release/FilterMutations/Filter.py", line 135, in filterPointMutations
    numCols = validateMutectorFile(version, rawHeader.replace('\t', ' '))
  File "/cbc2/data2/henrik/repositories/UCSF-CostelloLab/LG3_Pipeline-next-release/FilterMutations/MuTector.py", line 165, in validateMutectorFile
    for i in xrange(max(actualLen, NumMutectorColumns)):
NameError: global name 'NumMutectorColumns' is not defined

real    0m0.416s
user    0m0.060s
sys     0m0.128s
ERROR: Filtering failed
Traceback:
1: main() on line #219 in /cbc2/data2/henrik/repositories/UCSF-CostelloLab/LG3_Pipeline-next-release/scripts/MutDet.sh
Exiting (exit 1)
ERROR: MutDet failed
Traceback:
1: main() on line #101 in /var/spool/torque/mom_priv/jobs/2064182.cclc01.som.ucsf.edu.SC
Exiting (exit 1)
[henrik@cclc01 ~/repositories/UCSF-CostelloLab/test-next-release]$