bcgsc / mavis

Merging, Annotation, Validation, and Illustration of Structural variants
http://mavis.bcgsc.ca
GNU General Public License v3.0
72 stars 13 forks source link

Docker Validation Error - tutorial full #332

Open husamia opened 2 years ago

husamia commented 2 years ago

Describe the bug There is an error running the workflow as it is.

To Reproduce Built the docker image using Docker file provided with no errors. Downloaded the tutorial data and references needed per tutorial full. Then executed the tutorial workflow

root@7aa05bbe492b:/app/mavis# snakemake --jobs 1 --configfile=tests/full-tutorial.config.json

Expected behavior I expected the results to be with no errors, however, I got error messages and the workflow didn't complete.

Input Data Tutorial data per documentation

Configuration Docker

Versions (please complete the following information):

Additional context I want to make sure I get the workflow running correctly with the tutorial data.

root@7aa05bbe492b:/app/mavis# snakemake --jobs 1 --configfile=tests/full-tutorial.config.json
Building DAG of jobs...
Using shell: /bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Singularity containers: ignored
Job stats:
job            count    min threads    max threads
-----------  -------  -------------  -------------
all                1              1              1
annotate         127              1              1
cluster            3              1              1
convert            5              1              1
copy_config        1              1              1
init_config        1              1              1
pairing            1              1              1
summary            1              1              1
validate         127              1              1
total            267              1              1

Select jobs to execute...

[Thu May 26 20:23:57 2022]
rule copy_config:
    output: output_dir_full/config.raw.json
    log: output_dir_full/logs/copy_config.snakemake.log.txt
    jobid: 8
    resources: tmpdir=/tmp, time_limit=57600, mem_mb=4000, cpus=1, log_dir=output_dir_full/logs

[Thu May 26 20:23:57 2022]
Finished job 8.
1 of 267 steps (0.4%) done
Select jobs to execute...

[Thu May 26 20:23:57 2022]
rule init_config:
    input: output_dir_full/config.raw.json
    output: output_dir_full/config.json
    log: output_dir_full/logs/init_config.snakemake.log.txt
    jobid: 7
    resources: tmpdir=/tmp, time_limit=57600, mem_mb=16000, cpus=1, log_dir=output_dir_full/logs

2022-05-26 20:23:59,479 [INFO] MAVIS: 3.0.0
2022-05-26 20:23:59,479 [INFO] hostname: 7aa05bbe492b
2022-05-26 20:23:59,479 [INFO] arguments
2022-05-26 20:23:59,479 [INFO]  command= 'setup'
2022-05-26 20:23:59,479 [INFO]  config= '/app/mavis/output_dir_full/config.raw.json'
2022-05-26 20:23:59,479 [INFO]  log= None
2022-05-26 20:23:59,479 [INFO]  log_level= 'INFO'
2022-05-26 20:23:59,479 [INFO]  outputfile= 'output_dir_full/config.json'
2022-05-26 20:24:00,826 [INFO] loading: ['/app/mavis/reference_inputs/ensembl69_hg19_annotations.v3.json']
2022-05-26 20:32:09,562 [WARNING] Skipped 19031 translations where the CDS length was not a multiple of 3
2022-05-26 20:32:09,562 [WARNING] Skipped 68 domains due to errors (coordinates defined outside the translated region)
2022-05-26 20:32:17,705 [INFO] writing: output_dir_full/config.json
2022-05-26 20:32:17,706 [INFO] run time (hh/mm/ss): 0:08:18
2022-05-26 20:32:17,706 [INFO] run time (s): 498
[Thu May 26 20:33:02 2022]
Finished job 7.
2 of 267 steps (1%) done
Select jobs to execute...

[Thu May 26 20:33:02 2022]
rule convert:
    input: output_dir_full/config.json
    output: output_dir_full/converted_outputs/defuse.tab
    log: output_dir_full/logs/convert.snakemake.defuse.log.txt
    jobid: 139
    wildcards: alias=defuse
    resources: tmpdir=/tmp, time_limit=57600, mem_mb=16000, cpus=1, log_dir=output_dir_full/logs

[Thu May 26 20:33:04 2022]
Finished job 139.
3 of 267 steps (1%) done
Select jobs to execute...

[Thu May 26 20:33:04 2022]
rule convert:
    input: output_dir_full/config.json
    output: output_dir_full/converted_outputs/breakdancer.tab
    log: output_dir_full/logs/convert.snakemake.breakdancer.log.txt
    jobid: 6
    wildcards: alias=breakdancer
    resources: tmpdir=/tmp, time_limit=57600, mem_mb=16000, cpus=1, log_dir=output_dir_full/logs

[Thu May 26 20:33:15 2022]
Finished job 6.
4 of 267 steps (1%) done
Select jobs to execute...

[Thu May 26 20:33:15 2022]
rule convert:
    input: output_dir_full/config.json
    output: output_dir_full/converted_outputs/breakseq.tab
    log: output_dir_full/logs/convert.snakemake.breakseq.log.txt
    jobid: 9
    wildcards: alias=breakseq
    resources: tmpdir=/tmp, time_limit=57600, mem_mb=16000, cpus=1, log_dir=output_dir_full/logs

[Thu May 26 20:33:18 2022]
Finished job 9.
5 of 267 steps (2%) done
Select jobs to execute...

[Thu May 26 20:33:18 2022]
rule convert:
    input: output_dir_full/config.json
    output: output_dir_full/converted_outputs/manta.tab
    log: output_dir_full/logs/convert.snakemake.manta.log.txt
    jobid: 10
    wildcards: alias=manta
    resources: tmpdir=/tmp, time_limit=57600, mem_mb=16000, cpus=1, log_dir=output_dir_full/logs

[Thu May 26 20:33:27 2022]
Finished job 10.
6 of 267 steps (2%) done
Select jobs to execute...

[Thu May 26 20:33:27 2022]
rule cluster:
    input: output_dir_full/converted_outputs/breakdancer.tab, output_dir_full/converted_outputs/breakseq.tab, output_dir_full/converted_outputs/manta.tab, output_dir_full/config.json
    output: output_dir_full/L1522785992-normal/cluster
    log: output_dir_full/logs/snakemake.cluster.L1522785992-normal.log.txt
    jobid: 5
    wildcards: library=L1522785992-normal
    resources: tmpdir=/tmp, time_limit=57600, mem_mb=16000, cpus=1, log_dir=output_dir_full/logs

[Thu May 26 20:42:24 2022]
Finished job 5.
7 of 267 steps (3%) done
Select jobs to execute...

[Thu May 26 20:42:24 2022]
rule cluster:
    input: output_dir_full/converted_outputs/breakdancer.tab, output_dir_full/converted_outputs/breakseq.tab, output_dir_full/converted_outputs/manta.tab, output_dir_full/config.json
    output: output_dir_full/L1522785992-tumour/cluster
    log: output_dir_full/logs/snakemake.cluster.L1522785992-tumour.log.txt
    jobid: 142
    wildcards: library=L1522785992-tumour
    resources: tmpdir=/tmp, time_limit=57600, mem_mb=16000, cpus=1, log_dir=output_dir_full/logs

[Thu May 26 20:51:24 2022]
Finished job 142.
8 of 267 steps (3%) done
Select jobs to execute...

[Thu May 26 20:51:24 2022]
rule convert:
    input: output_dir_full/config.json
    output: output_dir_full/converted_outputs/chimerascan.tab
    log: output_dir_full/logs/convert.snakemake.chimerascan.log.txt
    jobid: 138
    wildcards: alias=chimerascan
    resources: tmpdir=/tmp, time_limit=57600, mem_mb=16000, cpus=1, log_dir=output_dir_full/logs

[Thu May 26 20:51:26 2022]
Finished job 138.
9 of 267 steps (3%) done
Select jobs to execute...

[Thu May 26 20:51:26 2022]
rule cluster:
    input: output_dir_full/converted_outputs/chimerascan.tab, output_dir_full/converted_outputs/defuse.tab, output_dir_full/config.json
    output: output_dir_full/L1522785992-trans/cluster
    log: output_dir_full/logs/snakemake.cluster.L1522785992-trans.log.txt
    jobid: 137
    wildcards: library=L1522785992-trans
    resources: tmpdir=/tmp, time_limit=57600, mem_mb=16000, cpus=1, log_dir=output_dir_full/logs

[Thu May 26 21:00:06 2022]
Finished job 137.
10 of 267 steps (4%) done
Select jobs to execute...

[Thu May 26 21:00:06 2022]
rule validate:
    input: output_dir_full/L1522785992-normal/cluster
    output: output_dir_full/L1522785992-normal/validate/batch-39/validation-passed.tab
    log: output_dir_full/logs/L1522785992-normal.validate.snakemake.batch-39.log.txt
    jobid: 86
    wildcards: library=L1522785992-normal, job_id=39
    resources: tmpdir=/tmp, time_limit=57600, mem_mb=18000, cpus=2, log_dir=output_dir_full/logs

[Thu May 26 21:00:43 2022]
Error in rule validate:
    jobid: 86
    output: output_dir_full/L1522785992-normal/validate/batch-39/validation-passed.tab
    log: output_dir_full/logs/L1522785992-normal.validate.snakemake.batch-39.log.txt (check log file(s) for error message)
    shell:
        mavis validate --config output_dir_full/config.json --library L1522785992-normal --inputs output_dir_full/L1522785992-normal/cluster/batch-39.tab --output output_dir_full/L1522785992-normal/validate/batch-39 &> output_dir_full/logs/L1522785992-normal.validate.snakemake.batch-39.log.txt
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /app/mavis/.snakemake/log/2022-05-26T202356.690216.snakemake.log
creisle commented 2 years ago

@husamia Could you post the contents of the log file for the job that failed? log: output_dir_full/logs/L1522785992-normal.validate.snakemake.batch-39.log.txt (check log file(s) for error message)

Built the docker image using Docker file provided with no errors.

You should not need to build anything. To use the mavis docker image with snakemake you will need to use singularity flag --use-singularity

husamia commented 2 years ago

cat output_dir_full/logs/L1522785992-normal.validate.snakemake.batch-39.log.txt

`2022-05-26 21:00:07,691 [INFO] MAVIS: 3.0.0
2022-05-26 21:00:07,692 [INFO] hostname: 7aa05bbe492b
2022-05-26 21:00:07,692 [INFO] arguments
2022-05-26 21:00:07,692 [INFO]  command= 'validate'
2022-05-26 21:00:07,692 [INFO]  config= '/app/mavis/output_dir_full/config.json'
2022-05-26 21:00:07,692 [INFO]  inputs = ['output_dir_full/L1522785992-normal/cluster/batch-39.tab']
2022-05-26 21:00:07,692 [INFO]  library= 'L1522785992-normal'
2022-05-26 21:00:07,692 [INFO]  log= None
2022-05-26 21:00:07,692 [INFO]  log_level= 'INFO'
2022-05-26 21:00:07,692 [INFO]  output= 'output_dir_full/L1522785992-normal/validate/batch-39'
2022-05-26 21:00:07,722 [INFO] creating output directory: 'output_dir_full/L1522785992-normal/validate/batch-39'
2022-05-26 21:00:07,723 [INFO] loading: ['/app/mavis/reference_inputs/hg19.fa']
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/mavis/annotate/file_io.py", line 422, in load
    self.content = self.loader(*self.name, **self.opt)
  File "/usr/local/lib/python3.7/site-packages/mavis/annotate/file_io.py", line 256, in load_reference_genome
    for chrom, seq in SeqIO.to_dict(SeqIO.parse(fh, 'fasta')).items():
  File "/usr/local/lib/python3.7/site-packages/Bio/SeqIO/__init__.py", line 789, in to_dict
    raise ValueError("Duplicate key '%s'" % key)
ValueError: Duplicate key 'chr1'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/mavis", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.7/site-packages/mavis/main.py", line 295, in main
    raise err
  File "/usr/local/lib/python3.7/site-packages/mavis/main.py", line 237, in main
    library=args.library,
  File "/usr/local/lib/python3.7/site-packages/mavis/validate/main.py", line 54, in main
    reference_genome = ReferenceFile.load_from_config(config, 'reference_genome', eager_load=True)
  File "/usr/local/lib/python3.7/site-packages/mavis/annotate/file_io.py", line 431, in load_from_config
    return ReferenceFile(file_type, *config.get(f'reference.{file_type}', []), **kwargs)
  File "/usr/local/lib/python3.7/site-packages/mavis/annotate/file_io.py", line 384, in __init__
    self.load()
  File "/usr/local/lib/python3.7/site-packages/mavis/annotate/file_io.py", line 426, in load
    raise err.__class__(message)
ValueError: Error in loading files: /app/mavis/reference_inputs/hg19.fa. Duplicate key 'chr1'`
mattdoug604 commented 2 years ago

Thanks for providing the logs @husamia. We'll try to look into this soon.