bcbio / bcbio-nextgen

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis
https://bcbio-nextgen.readthedocs.io
MIT License
991 stars 354 forks source link

Did not find reference indices for aligner bismark in genome #3031

Closed HyunjunNam closed 4 years ago

HyunjunNam commented 4 years ago

Hi,

I was testing bcbio to run bismark for BS-seq analysis, but I got the error below,

ValueError: Did not find reference indices for aligner bismark in genome: {'fasta': {'base': '/isilon/prod2/bcbio/genomes/Hsapiens/GRCh37/seq/GRCh37.fa'}, 'fastagz': {'base': '/isilon/prod2/bcbio/genomes/Hsapiens/GRCh37/seq/GRCh37.fa.gz', 'indexes': ['/isilon/prod2/bcbio/genomes/Hsapiens/GRCh37/seq/GRCh37.fa.gz.fai', '/isilon/prod2/bcbio/genomes/Hsapiens/GRCh37/seq/GRCh37.fa.gz.gzi', '/isilon/prod2/bcbio/genomes/Hsapiens/GRCh37/seq/GRCh37.dict']}, 'bismark': {}, 'rtg': {'base': '/isilon/prod2/bcbio/genomes/Hsapiens/GRCh37/rtg/GRCh37.sdf/mainIndex', 'indexes': ['/isilon/prod2/bcbio/genomes/Hsapiens/GRCh37/rtg/GRCh37.sdf/sequenceIndex0', '/isilon/prod2/bcbio/genomes/Hsapiens/GRCh37/rtg/GRCh37.sdf/seqdata1', '/isilon/prod2/bcbio/genomes/Hsapiens/GRCh37/rtg/GRCh37.sdf/seqdata0', '/isilon/prod2/bcbio/genomes/Hsapiens/GRCh37/rtg/GRCh37.sdf/seqpointer2', '/isilon/prod2/bcbio/genomes/Hsapiens/GRCh37/rtg/GRCh37.sdf/seqpointer0', '/isilon/prod2/bcbio/genomes/Hsapiens/GRCh37/rtg/GRCh37.sdf/progress', '/isilon/prod2/bcbio/genomes/Hsapiens/GRCh37/rtg/GRCh37.sdf/seqpointer3', '/isilon/prod2/bcbio/genomes/Hsapiens/GRCh37/rtg/GRCh37.sdf/summary.txt', '/isilon/prod2/bcbio/genomes/Hsapiens/GRCh37/rtg/GRCh37.sdf/suffixpointer0', '/isilon/prod2/bcbio/genomes/Hsapiens/GRCh37/rtg/GRCh37.sdf/namepointer0', '/isilon/prod2/bcbio/genomes/Hsapiens/GRCh37/rtg/GRCh37.sdf/suffixdata0', '/isilon/prod2/bcbio/genomes/Hsapiens/GRCh37/rtg/GRCh37.sdf/namedata0', '/isilon/prod2/bcbio/genomes/Hsapiens/GRCh37/rtg/GRCh37.sdf/seqdata2', '/isilon/prod2/bcbio/genomes/Hsapiens/GRCh37/rtg/GRCh37.sdf/reference.txt', '/isilon/prod2/bcbio/genomes/Hsapiens/GRCh37/rtg/GRCh37.sdf/nameIndex0', '/isilon/prod2/bcbio/genomes/Hsapiens/GRCh37/rtg/GRCh37.sdf/seqpointer1', '/isilon/prod2/bcbio/genomes/Hsapiens/GRCh37/rtg/GRCh37.sdf/seqdata3', '/isilon/prod2/bcbio/genomes/Hsapiens/GRCh37/rtg/GRCh37.sdf/done', '/isilon/prod2/bcbio/genomes/Hsapiens/GRCh37/rtg/GRCh37.sdf/suffixIndex0', '/isilon/prod2/bcbio/genomes/Hsapiens/GRCh37/rtg/GRCh37.sdf/format.log']}, 'genome_context': ['/isilon/prod2/bcbio/genomes/Hsapiens/GRCh37/coverage/problem_regions/ENCODE/wgEncodeDacMapabilityConsensusExcludable.bed.gz', '/isilon/prod2/bcbio/genomes/Hsapiens/GRCh37/coverage/problem_regions/GA4GH/bad_promoter.bed.gz', '/isilon/prod2/bcbio/genomes/Hsapiens/GRCh37/coverage/problem_regions/GA4GH/gc15.bed.gz', '/isilon/prod2/bcbio/genomes/Hsapiens/GRCh37/coverage/problem_regions/GA4GH/gc15to20.bed.gz', '/isilon/prod2/bcbio/genomes/Hsapiens/GRCh37/coverage/problem_regions/GA4GH/gc20to25.bed.gz', '/isilon/prod2/bcbio/genomes/Hsapiens/GRCh37/coverage/problem_regions/GA4GH/gc25to30.bed.gz', '/isilon/prod2/bcbio/genomes/Hsapiens/GRCh37/coverage/problem_regions/GA4GH/gc65to70.bed.gz', '/isilon/prod2/bcbio/genomes/Hsapiens/GRCh37/coverage/problem_regions/GA4GH/gc70to75.bed.gz', '/isilon/prod2/bcbio/genomes/Hsapiens/GRCh37/coverage/problem_regions/GA4GH/gc75to80.bed.gz', '/isilon/prod2/bcbio/genomes/Hsapiens/GRCh37/coverage/problem_regions/GA4GH/gc80to85.bed.gz', '/isilon/prod2/bcbio/genomes/Hsapiens/GRCh37/coverage/problem_regions/GA4GH/gc85.bed.gz', '/isilon/prod2/bcbio/genomes/Hsapiens/GRCh37/coverage/problem_regions/GA4GH/heng_um75-hs37d5.bed.gz', '/isilon/prod2/bcbio/genomes/Hsapiens/GRCh37/coverage/problem_regions/GA4GH/low_complexity_51to200bp.bed.gz', '/isilon/prod2/bcbio/genomes/Hsapiens/GRCh37/coverage/problem_regions/GA4GH/low_complexity_gt200bp.bed.gz', '/isilon/prod2/bcbio/genomes/Hsapiens/GRCh37/coverage/problem_regions/GA4GH/self_chain.bed.gz', '/isilon/prod2/bcbio/genomes/Hsapiens/GRCh37/coverage/problem_regions/repeats/LCR.bed.gz', '/isilon/prod2/bcbio/genomes/Hsapiens/GRCh37/coverage/problem_regions/repeats/polyx.bed.gz'], 'viral': ['/isilon/prod2/bcbio/genomes/Hsapiens/GRCh37/viral/gdc-viral.dict', '/isilon/prod2/bcbio/genomes/Hsapiens/GRCh37/viral/gdc-viral.fa', '/isilon/prod2/bcbio/genomes/Hsapiens/GRCh37/viral/gdc-viral.fa.amb', '/isilon/prod2/bcbio/genomes/Hsapiens/GRCh37/viral/gdc-viral.fa.ann', '/isilon/prod2/bcbio/genomes/Hsapiens/GRCh37/viral/gdc-viral.fa.bwt', '/isilon/prod2/bcbio/genomes/Hsapiens/GRCh37/viral/gdc-viral.fa.fai', '/isilon/prod2/bcbio/genomes/Hsapiens/GRCh37/viral/gdc-viral.fa.pac', '/isilon/prod2/bcbio/genomes/Hsapiens/GRCh37/viral/gdc-viral.fa.sa'], 'versions': '/isilon/prod2/bcbio/genomes/Hsapiens/GRCh37/versions.csv'}

As you can see here, indice for bismark is empty. Is there any way to download or generate it?

And currently it looks like bcbio can run bismark only based on the pair-end read, not single end. Is it correct?

roryk commented 4 years ago

Hi @HyunjunNam,

You'll have to build the bismark genome:

bcbio_nextgen.py upgrade --data --genomes GRCh37 --aligner bismark

that should install the bismark genome for GRCh37, if you want to use other genomes you can use those. That's right, it just works with paired end data for now.

HyunjunNam commented 4 years ago

Hi Rory,

I tried that, but I think that bismark is not in the aligners list for updating yet. Here is the message that I got.

[/ bin]$ ./bcbio_nextgen.py upgrade --data --genomes GRCh37 --aligners bismark
usage: bcbio_nextgen.py upgrade [-h] [--cores CORES] [--tooldir TOOLDIR]
                                [--tools]
                                [-u {stable,development,system,deps,skip}]
                                [--toolconf TOOLCONF] [--revision REVISION]
                                [--toolplus TOOLPLUS]
                                [--datatarget {variation,rnaseq,smallrna,gemini,vep,dbnsfp,dbscsnv,battenberg,kraken,ericscript,gnomad}]
                                [--genomes {GRCh37,hg19,hg38,hg38-noalt,mm10,mm9,rn6,rn5,canFam3,dm3,galGal4,phix,pseudomonas_aeruginosa_ucbpp_pa14,sacCer3,TAIR10,WBcel235,xenTro3,GRCz10,GRCz11,Sscrofa11.1,BDGP6}]
                                [--aligners {bwa,rtg,hisat2,bbmap,bowtie,bowtie2,minimap2,novoalign,twobit,snap,star,seq}]
                                [--data] [--cwl] [--isolate]
                                [--distribution {ubuntu,debian,centos,scientificlinux,macosx}]
bcbio_nextgen.py upgrade: error: argument --aligners: invalid choice: 'bismark' (choose from 'bwa', 'rtg', 'hisat2', 'bbmap', 'bowtie', 'bowtie2', 'minimap2', 'novoalign', 'twobit', 'snap', 'star', 'seq')

Currently, I am using version 1.1.8 of bcbio.

Thanks!

roryk commented 4 years ago

Thanks, sorry about that, I think this is a bug, I'll work on fixing it today.