bcbio / bcbio-nextgen-vm

Run bcbio-nextgen genomic sequencing analyses using isolated containers and virtual machines
MIT License
65 stars 17 forks source link

Kraken not availble inside the docker image #179

Open cvaske opened 4 years ago

cvaske commented 4 years ago

bcbio_mv.py install fails due to a lack of the kraken executable in the tooldir. I ran the following installation command:

bcbio_vm.py --datadir /bcbiotest install \
    --data --tools --cores 8 \
    --genomes hg19 \
    --aligners bwa
    --datatarget variation --datatarget battenberg --datatarget kraken --datatarget gemini

And got the stack trace:

Upgrading bcbio-nextgen data files
List of genomes to get (from the config file at '{'genomes': [{'dbkey': 'hg19', 'name': 'Human (hg19)', 'indexes': ['seq', 'twobit'], 'annotations': ['GA4GH_problem_regions', 'capture_regions', 'MIG', 'prioritize', 'dbsnp', 'hapmap', '1000g_omni_snps', 'ACMG56_genes', '1000g_snps', 'mills_indels', 'clinvar', 'cosmic', 'ancestral', 'qsignature', 'genesplicer', 'effects_transcripts', 'varpon', 'vcfanno', 'viral', 'battenberg', 'esp', 'exac', 'gnomad_exome', '1000g', 'transcripts', 'RADAR', 'rmsk', 'fusion-blacklist', 'mirbase'], 'validation': ['giab-NA12878', 'platinum-genome-NA12878', 'giab-NA24385', 'giab-NA24631', 'giab-NA24143', 'giab-NA24149']}], 'genome_indexes': ['bwa', 'rtg'], 'install_liftover': False, 'install_uniref': False}'): Human (hg19)
Running GGD recipe: hg19 esp ESP6500SI-V2
Running GGD recipe: hg19 exac 0.3
Running GGD recipe: hg19 gnomad_exome 2.1.1
Running GGD recipe: hg19 1000g phase3_shapeit2_mvncall_integrated_v5a.20130502
Traceback (most recent call last):
  File "/usr/local/bin/bcbio_nextgen.py", line 228, in <module>
    install.upgrade_bcbio(kwargs["args"])
  File "/usr/local/share/bcbio-nextgen/anaconda/lib/python3.6/site-packages/bcbio/install.py", line 106, in upgrade_bcbio
    upgrade_bcbio_data(args, REMOTES)
  File "/usr/local/share/bcbio-nextgen/anaconda/lib/python3.6/site-packages/bcbio/install.py", line 354, in upgrade_bcbio_data
    _install_kraken_db(_get_data_dir(), args)
  File "/usr/local/share/bcbio-nextgen/anaconda/lib/python3.6/site-packages/bcbio/install.py", line 616, in _install_kraken_db
    os.path.join(tooldir, "bin", "kraken"))
argparse.ArgumentTypeError: kraken not installed in tooldir /usr/local/bin/kraken.
' returned non-zero exit status 1.

It appears that there is indeed no kraken executable in the docker image:

charlie@box:~$ docker run -it quay.io/bcbio/bcbio-vc find / -name kraken
charlie@box:~$

Is there a way to use kraken via the Docker container?

chapmanb commented 4 years ago

Charles; Apologies about the issue. Some of the programs you're looking to use (Kraken, Battenberg and Gemini) are pretty data intensive and not great fits for Docker, so we haven't integrated those to work with it. Those would best be used with a standard bcbio install not using Docker. Apologies for not supporting this and thanks for trying it out.

cvaske commented 4 years ago

Got it, that seems reasonable. Is this due to slower file I/O in Docker, or just to keep the container image size from getting too large?

Would you be interested in a pull request to update the documentation?