bioinform / somaticseq

An ensemble approach to accurately detect somatic mutations using SomaticSeq
http://bioinform.github.io/somaticseq/
BSD 2-Clause "Simplified" License
189 stars 53 forks source link

AssertionError in makeSomaticScripts.py #104

Closed Fer020707 closed 2 years ago

Fer020707 commented 2 years ago

Hi I am trying to run makeSomaticScripts.py in a conda environment, but it only generates the .cmd scripts. In advance, I would appreciate your response, thank you.

command: makeSomaticScripts.py single --bam IMSS_111_CKDL190141429-1a-DY0088-AK1680_H55VHBBXXL5.fastq.trimmed.bam --genome-reference /home/fer/Documents/ref/ucsc.hg19.fasta --output-directory PruebaVCF --dbsnp-vcf dbsnp/dbsnp_138.hg19.vcf --inclusion-region QIAGEN_NGHS-013X-Covered-modificado.bed --threads 1 --run-mutect2 --run-vardict --run-lofreq --run-scalpel --run-strelka2 --run-somaticseq --run-workflow

Error message: PruebaVCF/logs/mutect2.2021.12.07.05.03.18.584.cmd Traceback (most recent call last): File "/home/fer/anaconda3/envs/SomaticSeq/bin/makeSomaticScripts.py", line 4, in import('pkg_resources').run_script('SomaticSeq==3.6.3', 'makeSomaticScripts.py') File "/home/fer/anaconda3/envs/SomaticSeq/lib/python3.10/site-packages/pkg_resources/init.py", line 651, in run_script self.require(requires)[0].run_script(script_name, ns) File "/home/fer/anaconda3/envs/SomaticSeq/lib/python3.10/site-packages/pkg_resources/init.py", line 1448, in run_script exec(code, namespace, namespace) File "/home/fer/anaconda3/envs/SomaticSeq/lib/python3.10/site-packages/SomaticSeq-3.6.3-py3.10.egg/EGG-INFO/scripts/makeSomaticScripts.py", line 493, in make_workflow( args, workflowArguments ) File "/home/fer/anaconda3/envs/SomaticSeq/lib/python3.10/site-packages/SomaticSeq-3.6.3-py3.10.egg/EGG-INFO/scripts/makeSomaticScripts.py", line 398, in make_workflow scalpel_job = Scalpel.tumor_only( input_arguments, args.container_tech ) File "/home/fer/anaconda3/envs/SomaticSeq/lib/python3.10/site-packages/SomaticSeq-3.6.3-py3.10.egg/somaticseq/utilities/dockered_pipelines/somatic_mutations/Scalpel.py", line 118, in tumor_only assert os.path.exists( input_parameters['reference_dict'] ) AssertionError

versions: python 3.10.0 conda 4.10.3 bedtools v2.30.0 Docker 10.20.11 R 4.1.2

REPOSITORY TAG IMAGE ID CREATED SIZE broadinstitute/gatk latest 88e2886f9e27 4 weeks ago 4.5GB lethalfang/somaticseq latest 05bc8973a566 4 months ago 1.92GB lethalfang/strelka 2.9.10 a1e636617459 18 months ago 300MB lethalfang/vardictjava 1.7.0 5e281eb80bc4 19 months ago 851MB lethalfang/jointsnvmix2 0.7.5 503df7957382 3 years ago 484MB lethalfang/scalpel 0.5.4 cca8678e328b 3 years ago 527MB lethalfang/lofreq 2.1.3.1-1 21c2cd913130 3 years ago 550MB lethalfang/somaticsniper 1.0.5.0-2 55c8228fb895 4 years ago 465MB marghoob/muse 1.0rc_c 5e40e5758410 5 years ago 132MB djordjeklisic/sbg-varscan2 v1 0a3d079b6bc9 6 years ago 1.17GB

litaifang commented 2 years ago

For the reference file ucsc.hg19.fasta, make there is also ucsc.hg19.dict in the same directory: picard CreateSequenceDictionary.

Fer020707 commented 2 years ago

Thanks for your reply. When placing them in the same directory, he was able to run the mutect2 and vardict scripts, however I got an error when running lofreq

Error message: FATAL(lofreq_call.c|main_call:1293): Cowardly refusing to overwrite file '/9a7554e99a3447159943ca6a6d61ce78/PruebaVCF/LoFreq.vcf'. Exiting... INFO 2021-12-07 10:47:38,418 run_script FINISHED RUNNING PruebaVCF/logs/lofreq.2021.12.07.10.04.45.214.cmd in 3.653 seconds with an exit code of 1. INFO 2021-12-07 10:47:38,441 run_script bash PruebaVCF/logs/strelka.2021.12.07.10.04.45.214.cmd Start at 2021/12/07 10:47:38 [E::hts_idx_push] Chromosome blocks not continuous tbx_index_build failed: /bb2f0cfafb4a4aca82b3254c59a9eda6/PruebaVCF/QIAGEN_NGHS-013X-Covered-modificado.bed.gz INFO 2021-12-07 10:47:45,939 run_script FINISHED RUNNING PruebaVCF/logs/strelka.2021.12.07.10.04.45.214.cmd in 7.498 seconds with an exit code of 1. INFO 2021-12-07 10:47:46,369 run_script bash PruebaVCF/SomaticSeq/logs/somaticSeq.2021.12.07.10.04.45.214.cmd Start at 2021/12/07 10:47:46 INFO 2021-12-07 16:47:50,531 SomaticSeq SomaticSeq Input Arguments: output_directory=/937888568af14a718deb7ed8118def62/PruebaVCF/SomaticSeq, genome_reference=/07b76ad939fd4e7581aabfad8dbe0c9b/ucsc.hg19.fasta, truth_snv=None, truth_indel=None, classifier_snv=None, classifier_indel=None, pass_threshold=0.5, lowqual_threshold=0.1, algorithm=xgboost, homozygous_threshold=0.85, heterozygous_threshold=0.01, minimum_mapping_quality=1, minimum_base_quality=5, minimum_num_callers=0.5, dbsnp_vcf=None, cosmic_vcf=None, inclusion_region=/598a5e03ee5c441d9da37a55f2b54881/QIAGEN_NGHS-013X-Covered-modificado.bed, exclusion_region=None, threads=1, somaticseq_train=False, seed=0, tree_depth=12, iterations=None, features_excluded=[], keep_intermediates=False, bam_file=/1af2c713e6f4492fb973d8fea1330fdb/IMSS_111_CKDL190141429-1a-DY0088-AK1680_H55VHBBXXL5.fastq.trimmed.bam, sample_name=TUMOR, mutect_vcf=None, mutect2_vcf=/937888568af14a718deb7ed8118def62/PruebaVCF/MuTect2.vcf, varscan_vcf=None, vardict_vcf=/937888568af14a718deb7ed8118def62/PruebaVCF/VarDict.vcf, lofreq_vcf=/937888568af14a718deb7ed8118def62/PruebaVCF/LoFreq.vcf, scalpel_vcf=/937888568af14a718deb7ed8118def62/PruebaVCF/Scalpel.vcf, strelka_vcf=/937888568af14a718deb7ed8118def62/PruebaVCF/Strelka/results/variants/variants.vcf.gz, which=single /bin/sh: 1: cannot create /937888568af14a718deb7ed8118def62/PruebaVCF/Strelka/results/variants/variants.vcfc6e1aa5895194b7f9ebc14c8fe352472.gz: Directory nonexistent Error: Unable to open file /937888568af14a718deb7ed8118def62/PruebaVCF/Strelka/results/variants/variants.vcf.gz. Exiting. Traceback (most recent call last): File "/opt/somaticseq/somaticseq/run_somaticseq.py", line 457, in runSingle( outdir = args.output_directory, \ File "/opt/somaticseq/somaticseq/run_somaticseq.py", line 226, in runSingle outSnv, outIndel, intermediateVcfs, tempFiles = combineCallers.combineSingle(outdir=outdir, ref=ref, bam=bam, inclusion=inclusion, exclusion=exclusion, mutect=mutect, mutect2=mutect2, varscan=varscan, vardict=vardict, lofreq=lofreq, scalpel=scalpel, strelka=strelka, keep_intermediates=True) File "/usr/local/lib/python3.8/dist-packages/SomaticSeq-3.6.3-py3.8.egg/somaticseq/combine_callers.py", line 143, in combineSingle strelka_in = bed_intersector(strelka, os.sep.join(( outdir, 'intersect.strelka.vcf' )), inclusion, exclusion) File "/usr/local/lib/python3.8/dist-packages/SomaticSeq-3.6.3-py3.8.egg/somaticseq/vcfModifier/vcfIntersector.py", line 102, in bed_intersector subprocess.check_call(cmd_line, shell=True) File "/usr/lib/python3.8/subprocess.py", line 364, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command 'intersectBed -header -a /937888568af14a718deb7ed8118def62/PruebaVCF/Strelka/results/variants/variants.vcf.gz -b /598a5e03ee5c441d9da37a55f2b54881/QIAGEN_NGHS-013X-Covered-modificado.bed | uniq > /937888568af14a718deb7ed8118def62/PruebaVCF/Strelka/results/variants/variants.vcfc6e1aa5895194b7f9ebc14c8fe352472.gz' returned non-zero exit status 2. INFO 2021-12-07 10:47:57,328 run_script FINISHED RUNNING PruebaVCF/SomaticSeq/logs/somaticSeq.2021.12.07.10.04.45.214.cmd in 10.959 seconds with an exit code of 1. INFO 2021-12-07 10:47:57,666 Somatic_Mutation_Workflow SomaticSeq Workflow Done. Check your results. You may remove the 1 sub_directories.

litaifang commented 2 years ago

From the first error message, FATAL(lofreq_call.c|main_call:1293): Cowardly refusing to overwrite file '/9a7554e99a3447159943ca6a6d61ce78/PruebaVCF/LoFreq.vcf'. Exiting... You need to delete LoFreq.vcf files because LoFreq will not overwrite a vcf file that's already there.

Later:

tbx_index_build failed: /bb2f0cfafb4a4aca82b3254c59a9eda6/PruebaVCF/QIAGEN_NGHS-013X-Covered-modificado.bed.gz

Maybe you want to make sure the input bed file is ordered. You can order it by vcfsorter.pl hg19.dict modificado.bed > modificado_ordered.bed, and use that ordered bed file as input and see what happens.

Also make sure bedtools is installed such that intersectBed command is in the path.

Fer020707 commented 2 years ago

Thank you very much for your reply. When removing the bed file I got an error when running Strelka:

Error message: [2021-12-08T06:45:03.557825Z] [d9f51a288a39] [1_1] [TaskManager] [ERROR] Failed to complete sub-workflow task: 'EstimateSeqErrorParams+Sample000' launched from sub-workflow 'EstimateSeqErrorParams', failed sub-workflow classname: 'EstimateSequenceErrorWorkflowForSample' [2021-12-08T06:45:03.557989Z] [d9f51a288a39] [1_1] [TaskManager] [ERROR] [EstimateSeqErrorParams+Sample000] Error Message: [2021-12-08T06:45:03.558028Z] [d9f51a288a39] [1_1] [TaskManager] [ERROR] [EstimateSeqErrorParams+Sample000] Unhandled Exception in TaskRunner-Thread-EstimateSeqErrorParams+Sample000 [2021-12-08T06:45:03.558111Z] [d9f51a288a39] [1_1] [TaskManager] [ERROR] [EstimateSeqErrorParams+Sample000] Traceback (most recent call last): [2021-12-08T06:45:03.558329Z] [d9f51a288a39] [1_1] [TaskManager] [ERROR] [EstimateSeqErrorParams+Sample000] File "/opt/strelka/lib/python/pyflow/pyflow.py", line 1069, in run [2021-12-08T06:45:03.558366Z] [d9f51a288a39] [1_1] [TaskManager] [ERROR] [EstimateSeqErrorParams+Sample000] (retval, retmsg) = self._run() [2021-12-08T06:45:03.558398Z] [d9f51a288a39] [1_1] [TaskManager] [ERROR] [EstimateSeqErrorParams+Sample000] File "/opt/strelka/lib/python/pyflow/pyflow.py", line 1121, in _run [2021-12-08T06:45:03.558429Z] [d9f51a288a39] [1_1] [TaskManager] [ERROR] [EstimateSeqErrorParams+Sample000] self.workflow.workflow() [2021-12-08T06:45:03.558575Z] [d9f51a288a39] [1_1] [TaskManager] [ERROR] [EstimateSeqErrorParams+Sample000] File "/opt/strelka/lib/python/strelkaSequenceErrorEstimation.py", line 426, in workflow ... [2021-12-08T06:45:11.567268Z] [d9f51a288a39] [1_1] [WorkflowRunner] [ERROR] [EstimateSeqErrorParams+Sample000] raise Exception("Task memory requirement exceeds full available resources") [2021-12-08T06:45:11.567359Z] [d9f51a288a39] [1_1] [WorkflowRunner] [ERROR] [EstimateSeqErrorParams+Sample000] Exception: Task memory requirement exceeds full available resources [2021-12-08T06:45:11.567428Z] [d9f51a288a39] [1_1] [WorkflowRunner] [ERROR] Failed to complete sub-workflow task: 'EstimateSeqErrorParams' launched from master workflow, failed sub-workflow classname: 'EstimateSequenceErrorWorkflow'

Thanks

litaifang commented 2 years ago

You don't have to remove the .bed file. Just make sure it is sorted. Also, how much memory does your computer have?

[2021-12-08T06:45:11.567268Z] [d9f51a288a39] [1_1] [WorkflowRunner] [ERROR] [EstimateSeqErrorParams+Sample000] raise Exception("Task memory requirement exceeds full available resources")
Fer020707 commented 2 years ago

4 GB RAM and 250 GB ROM. It will be enough? or would it be better to install it on another computer?

litaifang commented 2 years ago

Yeah 4gb may be too little. I'd say at least 16gb for small analyses, and more if you're doing whole genome and needs more parallel threadd.

Fer020707 commented 2 years ago

When correcting the bed file and the ram memory (to 100 GB of RAM) everything works perfectly. Thank you very much for all your answers.