bcbio / bcbio-nextgen

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis
https://bcbio-nextgen.readthedocs.io
MIT License
991 stars 354 forks source link

Installation/upgrade error #2770

Closed BoPeng closed 5 years ago

BoPeng commented 5 years ago

I am running the installation script on top of a 1.1.4a installation and see

List of genomes to get (from the config file at '{'genomes': [{'dbkey': 'hg19', 'name': 'Human (hg19)', 'indexes': ['seq', 'twobit'], 'annotations': ['GA4GH_problem_regions', 'capture_regions', 'MIG', 'prioritize', 'dbsnp', 'hapmap', '1000g_omni_snps', 'ACMG56_genes', '1000g_snps', 'mills_indels', 'clinvar', 'cosmic', 'ancestral', 'qsignature', 'genesplicer', 'effects_transcripts', 'varpon', 'vcfanno', 'viral', 'transcripts', 'RADAR', 'rmsk', 'mirbase'], 'validation': ['giab-NA12878', 'platinum-genome-NA12878', 'giab-NA24385', 'giab-NA24631', 'giab-NA24143', 'giab-NA24149']}, {'dbkey': 'hg38', 'name': 'Human (hg38) full', 'indexes': ['seq', 'twobit', 'bwa', 'hisat2'], 'annotations': ['ccds', 'capture_regions', 'coverage', 'prioritize', 'dbsnp', 'hapmap_snps', '1000g_omni_snps', 'ACMG56_genes', '1000g_snps', 'mills_indels', '1000g_indels', 'clinvar', 'qsignature', 'genesplicer', 'effects_transcripts', 'varpon', 'vcfanno', 'viral', 'transcripts', 'RADAR', 'rmsk', 'mirbase'], 'validation': ['giab-NA12878', 'giab-NA24385', 'giab-NA24631', 'platinum-genome-NA12878', 'giab-NA12878-remap', 'giab-NA12878-crossmap', 'dream-syn4-crossmap', 'dream-syn3-crossmap', 'giab-NA12878-NA24385-somatic', 'giab-NA24143', 'giab-NA24149']}], 'genome_indexes': ['bwa', 'star', 'bowtie2', 'rtg'], 'install_liftover': False, 'install_uniref': False}'): Human (hg19), Human (hg38) full
Traceback (most recent call last):
  File "/rsrch3/scratch/bcb/bpeng1/bcbio/anaconda/bin/bcbio_nextgen.py", line 221, in <module>
    install.upgrade_bcbio(kwargs["args"])
  File "/rsrch3/scratch/bcb/bpeng1/bcbio/anaconda/lib/python3.6/site-packages/bcbio/install.py", line 105, in upgrade_bcbio
    upgrade_bcbio_data(args, REMOTES)
  File "/rsrch3/scratch/bcb/bpeng1/bcbio/anaconda/lib/python3.6/site-packages/bcbio/install.py", line 349, in upgrade_bcbio_data
    _upgrade_snpeff_data(galaxy_home, args, remotes)
  File "/rsrch3/scratch/bcb/bpeng1/bcbio/anaconda/lib/python3.6/site-packages/bcbio/install.py", line 418, in _upgrade_snpeff_data
    if os.path.exists(snpeff_db_dir) and _is_old_database(snpeff_db_dir, args):
  File "/rsrch3/scratch/bcb/bpeng1/bcbio/anaconda/lib/python3.6/site-packages/bcbio/install.py", line 445, in _is_old_database
    version_info = in_handle.readline().strip().split("\t")
TypeError: a bytes-like object is required, not 'str'
Traceback (most recent call last):
  File "bcbio_nextgen_install.py", line 287, in <module>
    main(parser.parse_args(), sys.argv[1:])
  File "bcbio_nextgen_install.py", line 44, in main
    subprocess.check_call([bcbio, "upgrade"] + _clean_args(sys_argv, args))
  File "/usr/lib64/python2.7/subprocess.py", line 542, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/rsrch3/scratch/bcb/bpeng1/bcbio/anaconda/bin/bcbio_nextgen.py', 'upgrade', '--tooldir=/rsrch3/scratch/bcb/bpeng1/bcbio/share', '--genomes', 'hg19', '--aligners', 'bwa', '--genomes', 'hg38', '--aligners', 'star', '--aligners', 'bowtie2', '--data']' returned non-zero exit status 1

It looks like a typical python2/3 error because 1.1.4 uses python3 and 1.1.4a uses python2, but I do not know if this is caused by running a python2 script from 1.1.4a with the new python3.

roryk commented 5 years ago

Hi @BoPeng,

You don't need to run the installation on top of the old installation, you can upgrade with bcbio_nextgen.py upgrade -u development --tools which will upgrade the existing installation. You only need to run the installation script when you are installing it for the first time.

BoPeng commented 5 years ago

Thanks. I am trying again with the upgrade command.

BTW, my current installation is under a directory 1.1.4a and it makes sense to rename it to 1.1.4 after upgrade, will upgrade fixes all paths if I rename the directory? I know that there are a bunch of symbolic links pointing to 1.1.4a/anaconda/bin and a few configuration files contain absolute path to the reference genomes.

roryk commented 5 years ago

Hi @BoPeng,

It won't, if you need to keep track of the current version you'd be better off installing a new version of bcbio into a different root directory and leaving the old one as is, that is how some folks using bcbio are keeping separate versions. For our work we just have one installation that we update.

BoPeng commented 5 years ago

I am seeing the same problem with upgrade, so it looks like a problem with upgrading from python2 (1.1.4a) to python3 (1.1.4). It is a huge undertaking to reinstall everything (with all the reference genomes and index building stuff) so I am trying to reuse whatever I have from 1.1.4a.

roryk commented 5 years ago

Hi @BoPeng,

From the error above, you have a systemwide python in your path:

 File "/usr/lib64/python2.7/subprocess.py", line 542, in check_call
    raise CalledProcessError(retcode, cmd)

which is causing the problem. If you can figure out how that is getting injected and remove it, you'll be all set.

roryk commented 5 years ago

For a clean install, you can point to an existing installation and just link the genome directory.

For example:

wget https://raw.github.com/chapmanb/bcbio-nextgen/master/scripts/bcbio_nextgen_install.py
python bcbio_nextgen_install.py ${HOME}/local/share/bcbio --tooldir=${HOME}/local --nodata
ln -s /n/app/bcbio/biodata/genomes/ ${HOME}/local/share/genomes
ln -s /n/app/bcbio/biodata/galaxy/tool-data ${HOME}/local/share/galaxy/tool-data

Our genome installation is in /n/app/bcbio/biodata/genomes, so I link to it. The tool-data directory also points to the genomes so that needs to be linked to. This makes a bcbio install in my ${HOME}/local/share/bcbio that is using the already-installed genomes.

BoPeng commented 5 years ago

I am still getting the same problem after hiding system python

[bpeng1@localhost Downloads]$ which python
/usr/bin/which: no python in (/rsrch3/scratch/bcb/bpeng1/bcbio/share/bin:/rsrch3/scratch/bcb/bpeng1/bcbio/share/bin:/rsrch3/scratch/bcb/bpeng1/bcbio/share/bin:/usr/local/bin:/usr/local/sbin:/usr/bin:/usr/sbin:/bin:/sbin:/home/bpeng1/.local/bin:/home/bpeng1/bin)

List of genomes to get (from the config file at '{'genomes': [{'dbkey': 'GRCh37', 'name': 'Human (GRCh37)', 'indexes': ['seq', 'twobit'], 'annotations': ['GA4GH_problem_regions', 'capture_regions', 'MIG', 'prioritize', 'dbsnp', 'hapmap', '1000g_omni_snps', 'ACMG56_genes', '1000g_snps', 'mills_indels', 'clinvar', 'cosmic', 'ancestral', 'qsignature', 'genesplicer', 'effects_transcripts', 'varpon', 'vcfanno', 'viral', 'transcripts', 'RADAR', 'mirbase'], 'validation': ['giab-NA12878', 'giab-NA24385', 'giab-NA24631', 'dream-syn3', 'dream-syn4', 'giab-NA12878-NA24385-somatic', 'giab-NA24143', 'giab-NA24149']}, {'dbkey': 'mm10', 'name': 'Mouse (mm10)', 'indexes': ['twobit'], 'annotations': ['problem_regions', 'dbsnp', 'vcfanno', 'transcripts', 'rmsk', 'mirbase']}], 'genome_indexes': ['bwa', 'star', 'bowtie2', 'rtg'], 'install_liftover': False, 'install_uniref': False}'): Human (GRCh37), Mouse (mm10)
Traceback (most recent call last):
  File "/rsrch3/scratch/bcb/bpeng1/bcbio/share/bin/bcbio_nextgen.py", line 221, in <module>
    install.upgrade_bcbio(kwargs["args"])
  File "/rsrch3/scratch/bcb/bpeng1/bcbio/anaconda/lib/python3.6/site-packages/bcbio/install.py", line 105, in upgrade_bcbio
    upgrade_bcbio_data(args, REMOTES)
  File "/rsrch3/scratch/bcb/bpeng1/bcbio/anaconda/lib/python3.6/site-packages/bcbio/install.py", line 349, in upgrade_bcbio_data
    _upgrade_snpeff_data(galaxy_home, args, remotes)
  File "/rsrch3/scratch/bcb/bpeng1/bcbio/anaconda/lib/python3.6/site-packages/bcbio/install.py", line 418, in _upgrade_snpeff_data
    if os.path.exists(snpeff_db_dir) and _is_old_database(snpeff_db_dir, args):
  File "/rsrch3/scratch/bcb/bpeng1/bcbio/anaconda/lib/python3.6/site-packages/bcbio/install.py", line 445, in _is_old_database
    version_info = in_handle.readline().strip().split("\t")
TypeError: a bytes-like object is required, not 'str'
[bpeng1@localhost Downloads]$ 
roryk commented 5 years ago

Thanks, are you running --data with the upgrade? If you leave that off, and then after the upgrade is finished upgrade with --data you should be able to skip past this problem.

BoPeng commented 5 years ago

Well, the problem persists. upgrade without --data was successful, but rerunning with --data leads to the same problem. I think I will perform a clean installation with the linked genome directory.

BoPeng commented 5 years ago

Anyway, the function that caused the problem

def _is_old_database(db_dir, args):
    """Check for old database versions, supported in snpEff 4.1.
    """
    snpeff_version = effects.snpeff_version(args)
    if LooseVersion(snpeff_version) >= LooseVersion("4.1"):
        pred_file = os.path.join(db_dir, "snpEffectPredictor.bin")
        if not utils.file_exists(pred_file):
            return True
        with gzip.open(pred_file) as in_handle:
            version_info = in_handle.readline().strip().split("\t")
        program, version = version_info[:2]
        if not program.lower() == "snpeff" or LooseVersion(snpeff_version) > LooseVersion(version):
            return True
    return False

looks identical to the master so it is likely to be a valid bug.

roryk commented 5 years ago

Thanks, agreed. Let me see if I can reproduce and fix it. Thanks for all the debugging.

roryk commented 5 years ago

Ok, I understand the issue will push a fix in a minute. We missed this one.

roryk commented 5 years ago

I see, Brad did already fix this. Did you upgrade with bcbio_nextgen.py upgrade -u development? That will get you the fix . The line is here that has the fix: https://github.com/bcbio/bcbio-nextgen/blob/master/bcbio/install.py#L445

BoPeng commented 5 years ago

No, I did not use -u development because I was trying to install the stable 1.1.4 version.

roryk commented 5 years ago

Gotcha, 1.1.4 is not really stable it is more like a broken release, we've found a bunch of these python2 -> python3 issues after releasing it, so you're better off going with the development version and then upgrading to 1.1.5 when we've squashed them all.

BoPeng commented 5 years ago

OK, I can confirm that

bcbio_nextgen.py upgrade -u development --tools --data

failed with the same error, but

bcbio_nextgen.py upgrade -u development --tools
bcbio_nextgen.py upgrade -u development --tools --data

worked.

roryk commented 5 years ago

Thanks! Sorry for the scenic route getting to the source of the problem. Hopefully we'll squash all of these issues for the next release.