compbiocore / VariantVisualization.jl

Julia package powering VIVA, our tool for visualization of genomic variation data. Manual:
https://compbiocore.github.io/VariantVisualization.jl/stable/
Other
82 stars 13 forks source link

ERROR: LoadError: GeneticVariation.VCF.Reader #95

Open moneterg opened 3 years ago

moneterg commented 3 years ago

Hi,

I'm writing to ask about an error which appeared to me when I ran VIVA for the first time.

I've ran the command below:

viva --vcf_file 450exomes-cohort_GENOTYPEGVCF.vcf --output_directory . --save_format html --x_axis_labels --heatmap_title 450exomes-cohort_GENOTYPEGVCF --avg_dp sample,variant

So, I've got this error here:

Welcome to VIVA.

Loading dependency packages:

┌ Warning: ORCA.jl has been deprecated and all savefig functionality
│ has been implemented directly in PlotlyBase itself.
│
│ By implementing in PlotlyBase.jl, the savefig routines are automatically
│ available to PlotlyJS.jl also.
└ @ ORCA ~/.julia/packages/ORCA/U5XaN/src/ORCA.jl:8
...

Finished loading packages!

Reading 450exomes-cohort_notrimmed_COMBINED_GENOTYPEGVCF.vcf ...

ERROR: LoadError: GeneticVariation.VCF.Reader file format error on line 200
Stacktrace:
 [1] error(::String, ::Int64) at ./error.jl:42
 [2] _readheader!(::GeneticVariation.VCF.Reader, ::BioCore.Ragel.State{BufferedStreams.BufferedInputStream{IOStream}}) at /scratch/7411317/.julia/packages/BioCore/YBJvb/src/ReaderHelper.jl:106
 [3] readheader!(::GeneticVariation.VCF.Reader) at /scratch/7411317/.julia/packages/BioCore/YBJvb/src/ReaderHelper.jl:80
 [4] Reader at /scratch/7411317/.julia/packages/GeneticVariation/r8DAL/src/vcf/reader.jl:15 [inlined]
 [5] GeneticVariation.VCF.Reader(::IOStream) at /scratch/7411317/.julia/packages/GeneticVariation/r8DAL/src/vcf/reader.jl:28
 [6] top-level scope at /scratch/7411317/VariantVisualization.jl/viva:131
 [7] include(::Function, ::Module, ::String) at ./Base.jl:380
 [8] include(::Module, ::String) at ./Base.jl:368
 [9] exec_options(::Base.JLOptions) at ./client.jl:296
 [10] _start() at ./client.jl:506
in expression starting at /scratch/7411317/VariantVisualization.jl/viva:131

Such VCF was obtained by GATK v3.8 best practices pipeline. After haplotypecaller, I've ran combineGVCFs, after this, GenotypeGVCF.

I really appreciate if someone could help me here.

Thank you very much for your time.

gtollefson commented 3 years ago

Hi @Monete ,

Let's get this sorted! That error is coming from a dependency package which we use to read the VCF file. It doesn't like something that appears on line 200. Can you paste lines 195-205 of the VCF file as a reply below? We'll look for formatting or special symbols which the reader function may not be expecting. I'll ping the GeneticVariation.jl package developer to assist further.

@BenJWard Do you know whether VCF files produced by GATK v3.8 are supported by GeneticVariation.jl package readheader!() function? Can you help us troubleshoot once @Monete has sent us the offending VCF line?

moneterg commented 3 years ago

Hi @gtollefson

I'm pasting here lines between 195-201 (after line 201 there are variants information).

##contig=<ID=chrUn_gl000246,length=38154,assembly=hg19>
##contig=<ID=chrUn_gl000247,length=36422,assembly=hg19>
##contig=<ID=chrUn_gl000248,length=39786,assembly=hg19>
##contig=<ID=chrUn_gl000249,length=38502,assembly=hg19>
##dbSNP_BUILD_ID=138
##fileDate=20130806
##phasing=partial
##reference=file:///scratch/5644370/hg19/ucsc.hg19.fasta
##source=dbSNP
##variationPropertyDocumentationUrl=ftp://ftp.ncbi.nlm.nih.gov/snp/specs/dbSNP_BitField_latest.pdf  
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  AAX754  ABG583  ACA082  ...

I'm summarizing sample names here, since I have 450 names. Hope this help!

gtollefson commented 3 years ago

@Monete Ah ha! Ok, I'm guessing the readheader! function that we depend on is not expecting the header line title "variationPropertyDocumentationUrl " or isn't expecting one of the special characters in the url. I would delete that line in a text editor, save a new vcf file, and then rerun and let us know what you get.

moneterg commented 3 years ago

Hi @gtollefson

I'm trying to running this program on a slurm HPC system. On command-line in login node (not in a job), the first 5min of test seems to be ok!

But when I submit the job (with the command line identical to the test), this error appeared:

┌ Info: waiting for lock on pidfile
└   path = "/scratch/7411317/.jlassetregistry.lock"
┌ Warning: ORCA.jl has been deprecated and all savefig functionality
│ has been implemented directly in PlotlyBase itself.
│
│ By implementing in PlotlyBase.jl, the savefig routines are automatically
│ available to PlotlyJS.jl also.
└ @ ORCA ~/.julia/packages/ORCA/U5XaN/src/ORCA.jl:8
ERROR: LoadError: KeyError: key "3/7" not found
Stacktrace:
 [1] getindex at ./dict.jl:467 [inlined]
 [2] (::VariantVisualization.var"#translate#27"{Dict{Any,Any}})(::String) at /scratch/7411317/.julia/packages/VariantVisualization/1yoNl/src/vcf_utils_complete.jl:738
 [3] iterate at ./generator.jl:47 [inlined]
 [4] collect_to!(::Array{Int64,2}, ::Base.Generator{Array{Any,2},VariantVisualization.var"#translate#27"{Dict{Any,Any}}}, ::Int64, ::Int64) at ./array.jl:732
 [5] collect_to_with_first!(::Array{Int64,2}, ::Int64, ::Base.Generator{Array{Any,2},VariantVisualization.var"#translate#27"{Dict{Any,Any}}}, ::Int64) at ./array.jl:710
 [6] _collect(::Array{Any,2}, ::Base.Generator{Array{Any,2},VariantVisualization.var"#translate#27"{Dict{Any,Any}}}, ::Base.EltypeUnknown, ::Base.HasShape{2}) at ./array.jl:704
 [7] collect_similar at ./array.jl:628 [inlined]
 [8] map at ./abstractarray.jl:2162 [inlined]
 [9] translate_genotype_to_num_array(::Array{Any,2}, ::Dict{Any,Any}) at /scratch/7411317/.julia/packages/VariantVisualization/1yoNl/src/vcf_utils_complete.jl:741
 [10] combined_all_genotype_array_functions(::Array{Any,1}) at /scratch/7411317/.julia/packages/VariantVisualization/1yoNl/src/vcf_utils_complete.jl:622
 [11] top-level scope at /scratch/7411317/VariantVisualization.jl/viva:410
 [12] include(::Function, ::Module, ::String) at ./Base.jl:380
 [13] include(::Module, ::String) at ./Base.jl:368
 [14] exec_options(::Base.JLOptions) at ./client.jl:296
 [15] _start() at ./client.jl:506
in expression starting at /scratch/7411317/VariantVisualization.jl/viva:408

Do you have any tip for me about this?

Thank you for your time.

gtollefson commented 3 years ago

@Monete no problem, I’m happy you’re using our tool! We’ll solve it.

Can you run the command to completion on the command line? It would help to know if it works there before debugging on the shared computing network. If you ran it in the login node but it didn’t complete, it’s possible that it didn’t reach that point in the run to trigger the error yet, since there are less resources available on the login node.

moneterg commented 3 years ago

Hi @gtollefson 1h 30min of running on command-line on login node. Like you said, the error appeared.

Command line:

viva --vcf_file 450exomes-cohort_COMBINED_GENOTYPEGVCF_edited.vcf --output_directory . --save_format html --x_axis_labels --heatmap_title 450exomes-cohort_COMBINED_GENOTYPEGVCF_edited --avg_dp sample,variant

Output:

Welcome to VIVA.

Loading dependency packages:

┌ Warning: ORCA.jl has been deprecated and all savefig functionality
│ has been implemented directly in PlotlyBase itself.
│ 
│ By implementing in PlotlyBase.jl, the savefig routines are automatically
│ available to PlotlyJS.jl also.
└ @ ORCA ~/.julia/packages/ORCA/U5XaN/src/ORCA.jl:8
...

Finished loading packages!

Reading 450exomes-cohort_COMBINED_GENOTYPEGVCF_edited.vcf ...

No filters applied. Large vcf files will take a long time to process and heatmap visualizations will lose resolution at this scale unless viewed in interactive html for zooming.

Loading VCF file into memory for visualization
Selected 902958 variants with no filters applied
ERROR: LoadError: KeyError: key "3/7" not found
Stacktrace:
 [1] getindex at ./dict.jl:467 [inlined]
 [2] (::VariantVisualization.var"#translate#27"{Dict{Any,Any}})(::String) at /scratch/7411317/.julia/packages/VariantVisualization/1yoNl/src/vcf_utils_complete.jl:738
 [3] iterate at ./generator.jl:47 [inlined]
 [4] collect_to!(::Array{Int64,2}, ::Base.Generator{Array{Any,2},VariantVisualization.var"#translate#27"{Dict{Any,Any}}}, ::Int64, ::Int64) at ./array.jl:732
 [5] collect_to_with_first!(::Array{Int64,2}, ::Int64, ::Base.Generator{Array{Any,2},VariantVisualization.var"#translate#27"{Dict{Any,Any}}}, ::Int64) at ./array.jl:710
 [6] _collect(::Array{Any,2}, ::Base.Generator{Array{Any,2},VariantVisualization.var"#translate#27"{Dict{Any,Any}}}, ::Base.EltypeUnknown, ::Base.HasShape{2}) at ./array.jl:704
 [7] collect_similar at ./array.jl:628 [inlined]
 [8] map at ./abstractarray.jl:2162 [inlined]
 [9] translate_genotype_to_num_array(::Array{Any,2}, ::Dict{Any,Any}) at /scratch/7411317/.julia/packages/VariantVisualization/1yoNl/src/vcf_utils_complete.jl:741
 [10] combined_all_genotype_array_functions(::Array{Any,1}) at /scratch/7411317/.julia/packages/VariantVisualization/1yoNl/src/vcf_utils_complete.jl:622
 [11] top-level scope at /scratch/7411317/VariantVisualization.jl/viva:410
 [12] include(::Function, ::Module, ::String) at ./Base.jl:380
 [13] include(::Module, ::String) at ./Base.jl:368
 [14] exec_options(::Base.JLOptions) at ./client.jl:296
 [15] _start() at ./client.jl:506
in expression starting at /scratch/7411317/VariantVisualization.jl/viva:408

Thanks again :)

gtollefson commented 3 years ago

Hi @Monete,

The version of VCF files we developed VIVA to expect caps alternate allele numbers at 6, so it doesn't expect 3/7. I will modify the code to allow allele values over 6 so it can interpret 3/7, which interprets the variant as a heterozygous variant. We're in the middle of preparing several manuscripts/results for other projects so it may take me some time to push the changes. In the meantime, if you are able to change the 3/7 value to 3/6 using find and replace in a text editor, it should run. Let me know how it goes. Otherwise, I'll ping you here once I push this fix.