Closed NeillGibson closed 7 years ago
Hi Neil,
Sorry for the problem, could you give bcbio_setup_genome.py
a shot running it with the --gff3
flag? For a GTF we're expecting there to be transcript_id
and gene_id
attributes to figure out which genes go with which transcripts and the --gff3
flag will try to reconstruct those from the ID/parent attributes in the GFF3 file.
Hi Rory,
Thank you for the tip.
Adding the --gff3
made the reference genome installation finish without errors.
I did not see the --gff3 flag in on this documentation page but I could have seen it by just running bcbio_setup_genome.py -h
http://bcbio-nextgen.readthedocs.io/en/latest/contents/configuration.html#adding-custom-genomes
The files that are now installed for the reference genome are below. The extra files that I see are bowtie indexes, tophat indexes and some other RNA-seq related files.
I don't see a specific folder or file for snpEff. I expected to see a file called something like snpEffectPredictor.bin
as the snpEff database.
/Tools/bcbio-0.9.9/genomes/test_SnpEff/
/Tools/bcbio-0.9.9/genomes/test_SnpEff/test_SnpEff
/Tools/bcbio-0.9.9/genomes/test_SnpEff/test_SnpEff/seq
/Tools/bcbio-0.9.9/genomes/test_SnpEff/test_SnpEff/seq/test_SnpEff.fa
/Tools/bcbio-0.9.9/genomes/test_SnpEff/test_SnpEff/seq/test_SnpEff.fa.fai
/Tools/bcbio-0.9.9/genomes/test_SnpEff/test_SnpEff/seq/test_SnpEff.dict
/Tools/bcbio-0.9.9/genomes/test_SnpEff/test_SnpEff/seq/tx
/Tools/bcbio-0.9.9/genomes/test_SnpEff/test_SnpEff/seq/test_SnpEff-resources.yaml
/Tools/bcbio-0.9.9/genomes/test_SnpEff/test_SnpEff/rnaseq
/Tools/bcbio-0.9.9/genomes/test_SnpEff/test_SnpEff/bwa
/Tools/bcbio-0.9.9/genomes/test_SnpEff/test_SnpEff/bwa/test_SnpEff.fa.pac
/Tools/bcbio-0.9.9/genomes/test_SnpEff/test_SnpEff/bwa/test_SnpEff.fa.ann
/Tools/bcbio-0.9.9/genomes/test_SnpEff/test_SnpEff/bwa/test_SnpEff.fa.amb
/Tools/bcbio-0.9.9/genomes/test_SnpEff/test_SnpEff/bwa/test_SnpEff.fa.bwt
/Tools/bcbio-0.9.9/genomes/test_SnpEff/test_SnpEff/bwa/test_SnpEff.fa.sa
/Tools/bcbio-0.9.9/genomes/test_SnpEff/test_SnpEff/bowtie2
/Tools/bcbio-0.9.9/genomes/test_SnpEff/test_SnpEff/bowtie2/test_SnpEff.3.bt2
/Tools/bcbio-0.9.9/genomes/test_SnpEff/test_SnpEff/bowtie2/test_SnpEff.4.bt2
/Tools/bcbio-0.9.9/genomes/test_SnpEff/test_SnpEff/bowtie2/test_SnpEff.1.bt2
/Tools/bcbio-0.9.9/genomes/test_SnpEff/test_SnpEff/bowtie2/test_SnpEff.2.bt2
/Tools/bcbio-0.9.9/genomes/test_SnpEff/test_SnpEff/bowtie2/test_SnpEff.rev.1.bt2
/Tools/bcbio-0.9.9/genomes/test_SnpEff/test_SnpEff/bowtie2/test_SnpEff.rev.2.bt2
/Tools/bcbio-0.9.9/genomes/test_SnpEff/test_SnpEff/bowtie2/test_SnpEff.fa
/Tools/bcbio-0.9.9/genomes/test_SnpEff/test_SnpEff/rnaseq-2016-10-27
/Tools/bcbio-0.9.9/genomes/test_SnpEff/test_SnpEff/rnaseq-2016-10-27/version.txt
/Tools/bcbio-0.9.9/genomes/test_SnpEff/test_SnpEff/rnaseq-2016-10-27/ref-transcripts.gtf
/Tools/bcbio-0.9.9/genomes/test_SnpEff/test_SnpEff/rnaseq-2016-10-27/ref-transcripts.gtf.db
/Tools/bcbio-0.9.9/genomes/test_SnpEff/test_SnpEff/rnaseq-2016-10-27/ref-transcripts.genePred
/Tools/bcbio-0.9.9/genomes/test_SnpEff/test_SnpEff/rnaseq-2016-10-27/ref-transcripts.refFlat
/Tools/bcbio-0.9.9/genomes/test_SnpEff/test_SnpEff/rnaseq-2016-10-27/ref-transcripts.bed
/Tools/bcbio-0.9.9/genomes/test_SnpEff/test_SnpEff/rnaseq-2016-10-27/tx2gene.csv
/Tools/bcbio-0.9.9/genomes/test_SnpEff/test_SnpEff/rnaseq-2016-10-27/tx
/Tools/bcbio-0.9.9/genomes/test_SnpEff/test_SnpEff/rnaseq-2016-10-27/tophat
/Tools/bcbio-0.9.9/genomes/test_SnpEff/test_SnpEff/rnaseq-2016-10-27/tophat/test_SnpEff_transcriptome.gff
/Tools/bcbio-0.9.9/genomes/test_SnpEff/test_SnpEff/rnaseq-2016-10-27/tophat/test_SnpEff_transcriptome.fa
/Tools/bcbio-0.9.9/genomes/test_SnpEff/test_SnpEff/rnaseq-2016-10-27/tophat/test_SnpEff_transcriptome.fa.tlst
/Tools/bcbio-0.9.9/genomes/test_SnpEff/test_SnpEff/rnaseq-2016-10-27/tophat/test_SnpEff_transcriptome.ver
/Tools/bcbio-0.9.9/genomes/test_SnpEff/test_SnpEff/rnaseq-2016-10-27/tophat/test_SnpEff_transcriptome.3.bt2
/Tools/bcbio-0.9.9/genomes/test_SnpEff/test_SnpEff/rnaseq-2016-10-27/tophat/test_SnpEff_transcriptome.4.bt2
/Tools/bcbio-0.9.9/genomes/test_SnpEff/test_SnpEff/rnaseq-2016-10-27/tophat/test_SnpEff_transcriptome.1.bt2
/Tools/bcbio-0.9.9/genomes/test_SnpEff/test_SnpEff/rnaseq-2016-10-27/tophat/test_SnpEff_transcriptome.2.bt2
/Tools/bcbio-0.9.9/genomes/test_SnpEff/test_SnpEff/rnaseq-2016-10-27/tophat/test_SnpEff_transcriptome.rev.1.bt2
/Tools/bcbio-0.9.9/genomes/test_SnpEff/test_SnpEff/rnaseq-2016-10-27/tophat/test_SnpEff_transcriptome.rev.2.bt2
/Tools/bcbio-0.9.9/genomes/test_SnpEff/test_SnpEff/rnaseq-2016-10-27/kallisto
/Tools/bcbio-0.9.9/genomes/test_SnpEff/test_SnpEff/rnaseq-2016-10-27/kallisto/test_SnpEff
/Tools/bcbio-0.9.9/genomes/test_SnpEff/test_SnpEff/rnaseq-2016-10-27/ref-transcripts.fa
/Tools/bcbio-0.9.9/genomes/test_SnpEff/test_SnpEff-rnaseq-2016-10-27.tar.xz
I tried to align, variant call and effect predict a few samples against this reference genome with the following yaml but did not produce a VCF with effect predictions
# Template for whole genome Illumina variant calling with FreeBayes
# This is a GATK-free pipeline without post-alignment BAM pre-processing
# (recalibration and realignment)
---
details:
- analysis: variant2
genome_build: test_SnpEff
description:
# to do multi-sample variant calling, assign samples the same metadata / batch
metadata:
batch: project_123
algorithm:
aligner: bwa
mark_duplicates: true
recalibrate: false
realign: false
variantcaller: freebayes
nomap_split_targets: 3000
effects: snpeff
tools_off:
- gemini
# for targetted projects, set the region
# variant_regions: /path/to/your.bed
resources:
freebayes:
options: [--genotype-qualities, --min-mapping-quality 20]
The log file does not really mention effect prediction. Just Annotate VCF file
but no time is spend there.
[2016-10-28T08:37Z] gridmaster: ipython: concat_variant_files
[2016-10-28T08:37Z] gridmaster: Timing: variant post-processing
[2016-10-28T08:37Z] gridmaster: ipython: postprocess_variants
[2016-10-28T08:37Z] node17: Finalizing variant calls: project_123_02, freebayes
[2016-10-28T08:37Z] node17: Calculating variation effects for project_123_02, freebayes
[2016-10-28T08:37Z] node17: Annotate VCF file: project_123_02, freebayes
[2016-10-28T08:37Z] node17: Filtering for project_123_02, freebayes
[2016-10-28T08:54Z] node17: Prioritization for project_123_02, freebayes
[2016-10-28T08:54Z] node17: Germline extraction for project_123_02, freebayes
[2016-10-28T08:54Z] gridmaster: ipython: split_variants_by_sample
[2016-10-28T09:50Z] gridmaster: Timing: prepped BAM merging
[2016-10-28T09:50Z] gridmaster: Timing: validation
[2016-10-28T09:50Z] gridmaster: ipython: compare_to_rm
[2016-10-28T09:50Z] gridmaster: Timing: ensemble calling
[2016-10-28T09:50Z] gridmaster: Timing: validation summary
[2016-10-28T09:50Z] gridmaster: Timing: structural variation precall
[2016-10-28T09:50Z] gridmaster: ipython: detect_sv
[2016-10-28T09:50Z] gridmaster: Timing: structural variation
[2016-10-28T09:50Z] gridmaster: ipython: detect_sv
[2016-10-28T09:50Z] gridmaster: Timing: structural variation ensemble
[2016-10-28T09:50Z] gridmaster: ipython: detect_sv
[2016-10-28T09:50Z] gridmaster: Timing: structural variation validation
[2016-10-28T09:50Z] gridmaster: ipython: validate_sv
[2016-10-28T09:50Z] gridmaster: Timing: heterogeneity
[2016-10-28T09:50Z] gridmaster: ipython: heterogeneity_estimate
[2016-10-28T09:50Z] gridmaster: Timing: population database
[2016-10-28T09:50Z] gridmaster: ipython: prep_gemini_db
[2016-10-28T10:03Z] gridmaster: Timing: quality control
Is there something else that I need to do to install the snpEff database and / or to run snpEff?
Thank you.
Hi @roryk . Did you already have a chance to look at what is going wrong with building the custom snpEff database ?
Or could you confirm that I am missing a piece of information about how snpEff with custom genomes is supposed to work in bcbio?
Thank very much!
Hi Neil,
I'm so sorry for not getting back to you, I suck. I saw it looked resolved but missed the rest of the problem. For custom genome we don't grab and pull down extra annotations like snpEff, because we don't know what they should be; we just use the provided GTF file and the genome and that is it.
I think you can add a snpEff database yourself though. The way bcbio finds the snpEff database is it looks for it in the snpeff directory under the name seq/build-resources.yaml
file in the genome directory for your build. For example for human:
version: 26
aliases:
human: true
snpeff: GRCh37.75
ensembl: homo_sapiens_vep_83_GRCh37
Let's say you named your build something evocative like Lyco2.5.
so if you stick the snpEff files in the Lyco2.5/snpeff/Lyco2.5
directory that match up with the build and add the snpeff alias in the seq/Lyco2.5-resources.yaml
file it should pick up the annotations.
Hi Rory,
No worries, thank you for the response.
I currently solved my issue by manually building the SnpEff database from the GFF3 file outside of bcbio and also running SnpEff outside of bcbio. This also works fine since it is just a single command to run on the final VCF file.
With the new information I can try to get SnpEff running under bcbio.
Thank you.
Hi Neil,
I will close this now. Let us know if you find more issues.
Thanks!
Hi,
I am trying to install a genome with a gene annotation so that I can run snpEff together with variant calling.
The installation of the reference genome crashes when trying to do something with the gff3 file. "ValueError: No lines parsed -- was an empty file provided?"
Full error message:
The error mentions 2 gtf files. The first of which is really empty. This includes tmpcbl in the path
The second contains data, looks like the complete gff3 files parsed to gtf. This does include
rnaseq
in the path though I did not specify anything about rnaseq.Steps needed to reproduce this error
Is there something wrong with the gff3 file or the command I use to install the reference genome? Or did I maybe run in to a bug?
Thank you for looking at this.