bcbio / bcbio-nextgen

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis
https://bcbio-nextgen.readthedocs.io
MIT License
986 stars 353 forks source link

bcbio_setup_genome.py and IndexError: list index out of range #1354

Closed mattingsdal closed 8 years ago

mattingsdal commented 8 years ago

Hi and thanks for developing this exciting pipeline.

I have a fasta file from a de novo sequenced genome containing ~55 000 contigs and using

bcbio_setup_genome.py -f corkwing.fasta -n corkwing -b v1

produces the below error.

It seems that samtools and Picard successfully build their indices but an error is reported by python building galaxy .loc files. I suspect the number of contigs (chromosomes) is causing this issue. Are there any workarounds for this error ?

best morten mattingsdal

2016-04-21 10:30:33 (2.08 MB/s) - written to stdout [5150544/5150544]

Creating directories using /usr/local/share/bcbio/genomes as the base. Genomes will be installed into /usr/local/share/bcbio/genomes/corkwing/v1. Installed genome as /usr/local/share/bcbio/genomes/corkwing/v1/seq/v1.fa. Creating the seq index. [localhost] local: samtools faidx v1.fa [localhost] local: picard -Xms500m -Xmx1g CreateSequenceDictionary REFERENCE=/usr/local/share/bcbio/genomes/corkwing/v1/seq/v1.fa OUTPUT=/usr/local/share/bcbio/genomes/corkwing/v1/seq/v1.dict [Thu Apr 21 10:30:47 CEST 2016] picard.sam.CreateSequenceDictionary REFERENCE=/usr/local/share/bcbio/genomes/corkwing/v1/seq/v1.fa OUTPUT=/usr/local/share/bcbio/genomes/corkwing/v1/seq/v1.dict TRUNCATE_NAMES_AT_WHITESPACE=true NUM_SEQUENCES=2147483647 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json [Thu Apr 21 10:30:47 CEST 2016] Executing as root@morten-VirtualBox on Linux 3.13.0-24-generic amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_71-b15; Picard version: 1.141(8ece590411350163e7689e9e77aab8efcb622170_1447695087) IntelDeflater [Thu Apr 21 10:30:50 CEST 2016] picard.sam.CreateSequenceDictionary done. Elapsed time: 0.06 minutes. Runtime.totalMemory()=685768704 Dumping genome resources to /usr/local/share/bcbio/genomes/corkwing/v1/seq/v1-resources.yaml. Updating Galaxy .loc files. Traceback (most recent call last): File "/usr/local/bin/bcbio_setup_genome.py", line 4, in import('pkg_resources').run_script('bcbio-nextgen==0.9.7', 'bcbio_setup_genome.py') File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/setuptools-20.7.0-py2.7.egg/pkg_resources/init.py", line 719, in run_script File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/setuptools-20.7.0-py2.7.egg/pkg_resources/init.py", line 1504, in run_script File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio_nextgen-0.9.7-py2.7.egg-info/scripts/bcbio_setup_genome.py", line 320, in loc.update_loc_file(galaxy_base, index, args.build, index_file) File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/galaxy/loc.py", line 79, in update_loc_file build = parts[1] IndexError: list index out of range

chapmanb commented 8 years ago

Morten; Sorry about the issue. This looks like a problem with parsing the Galaxy location file we use to specify where the genome indexes are located. One line appears to have less fields than we expect. I'm not totally sure what caused this, but I pushed a fix based on the error which will hopefully work around it. If you update to the latest development:

bcbio_nextgen.py upgrade -u development

and re-run it'll hopefully work correctly for you. Please feel free to re-open and we can help more if this doesn't fix the issue.

mattingsdal commented 8 years ago

updating to development fixed this issue. Thanks for swift response !

Creating directories using /usr/local/share/bcbio/genomes as the base. Genomes will be installed into /usr/local/share/bcbio/genomes/corkwing/v1. Installed genome as /usr/local/share/bcbio/genomes/corkwing/v1/seq/v1.fa. Creating the seq index. Dumping genome resources to /usr/local/share/bcbio/genomes/corkwing/v1/seq/v1-resources.yaml. Updating Galaxy .loc files.