chapmanb / cloudbiolinux

CloudBioLinux: configure virtual (or real) machines with tools for biological analyses
http://cloudbiolinux.org
MIT License
257 stars 160 forks source link

rmsk: switch gzip to bgzip and add tabix #381

Closed leechuck closed 3 years ago

leechuck commented 3 years ago

My bcbio run fails due to lack of .tbi for the rmsk.gtf.gz; this PR switches gzip to bgzip for rmsk.gtf and adds the .tbi

roryk commented 3 years ago

Hi Robert!

Thank you so much for this fix. What was failing? I am curious.

leechuck commented 3 years ago

Thank you for merging!

The problem was in the compare_to_rm step with vcfanno:

2021-02-10T05:46:14.801371489Z Traceback (most recent call last):
2021-02-10T05:46:14.801371489Z   File "/usr/local/bin/bcbio_nextgen.py", line 230, in <module>
2021-02-10T05:46:14.801371489Z     runfn.process(kwargs["args"])
2021-02-10T05:46:14.801371489Z   File "/usr/local/share/bcbio-nextgen/anaconda/lib/python3.7/site-packages/bcbio/distributed/runfn.py", line 58, in process
2021-02-10T05:46:14.801371489Z     out = fn(*fnargs)
2021-02-10T05:46:14.801371489Z   File "/usr/local/share/bcbio-nextgen/anaconda/lib/python3.7/site-packages/bcbio/utils.py", line 59, in wrapper
2021-02-10T05:46:14.801371489Z     return f(*args, **kwargs)
2021-02-10T05:46:14.801371489Z   File "/usr/local/share/bcbio-nextgen/anaconda/lib/python3.7/site-packages/bcbio/distributed/multitasks.py", line 395, in compare_to_rm
2021-02-10T05:46:14.801371489Z     return validate.compare_to_rm(*args)
2021-02-10T05:46:14.801371489Z   File "/usr/local/share/bcbio-nextgen/anaconda/lib/python3.7/site-packages/bcbio/variation/validate.py", line 177, in compare_to_rm
2021-02-10T05:46:14.801371489Z     eval_files = _annotate_validations(eval_files, toval_data)
2021-02-10T05:46:14.801371489Z   File "/usr/local/share/bcbio-nextgen/anaconda/lib/python3.7/site-packages/bcbio/variation/validate.py", line 191, in _annotate_validations
2021-02-10T05:46:14.801371489Z     eval_files[key] = annotation.add_genome_context(eval_files[key], data)
2021-02-10T05:46:14.801371489Z   File "/usr/local/share/bcbio-nextgen/anaconda/lib/python3.7/site-packages/bcbio/variation/annotation.py", line 213, in add_genome_context
2021-02-10T05:46:14.801371489Z     do.run(cmd.format(**locals()), "Annotate with problem annotations", data)
2021-02-10T05:46:14.801371489Z   File "/usr/local/share/bcbio-nextgen/anaconda/lib/python3.7/site-packages/bcbio/provenance/do.py", line 26, in run
2021-02-10T05:46:14.801371489Z     _do_run(cmd, checks, log_stdout, env=env)
2021-02-10T05:46:14.801371489Z   File "/usr/local/share/bcbio-nextgen/anaconda/lib/python3.7/site-packages/bcbio/provenance/do.py", line 106, in _do_run
2021-02-10T05:46:14.801371489Z     raise subprocess.CalledProcessError(exitcode, error_msg)
2021-02-10T05:46:14.801371489Z subprocess.CalledProcessError: Command 'set -o pipefail; vcfanno /var/spool/cwl/bcbiotx/tmpvas3hil6/tp-baseline-context.toml /var/spool/cwl/validate/NA12878/gatk-haplotype/rtg/tp-baseline.vcf.gz | bgzip -c > /var/spool/cwl/bcbiotx/tmpvas3hil6/tp-baseline-context.vcf.gz
2021-02-10T05:46:14.801371489Z =============================================
2021-02-10T05:46:14.801371489Z vcfanno version 0.3.2 [built with go1.12.1]
2021-02-10T05:46:14.801371489Z see: https://github.com/brentp/vcfanno
2021-02-10T05:46:14.801371489Z =============================================
2021-02-10T05:46:14.801371489Z vcfanno.go:115: found 2 sources from 2 files
2021-02-10T05:46:14.801371489Z vcfanno.go:156: falling back to non-bgzip
2021-02-10T05:46:14.801371489Z api.go:796: bix: error on opening /keep/ec826309f8e9fcd979caf77aba530366+84929/hg38/coverage/problem_regions/repeats/rmsk.gtf.gz.tbi: open /keep/ec826309f8e9fcd979caf77aba530366+84929/hg38/coverage/problem_regions/repeats/rmsk.gtf.gz.tbi: no such file or directory
2021-02-10T05:46:14.801371489Z ' returned non-zero exit status 1.
roryk commented 3 years ago

Thank you!