Illumina / Nirvana

The nimble & robust variant annotator
https://illumina.github.io/NirvanaDocumentation/
GNU General Public License v3.0
171 stars 44 forks source link

Input string was not in a correct format #6

Closed viktorlj closed 7 years ago

viktorlj commented 7 years ago

Hi,

I'm trying to run Nirvana on my Mac (10.12.4). The build seems to be working fine with 0 warnings and errors. However, when testing on the HiSeq.10000.vcf suggested in the setup script the analysis fails with the message Input string was not in a correct format.

The command I run is:

dotnet bin/Release/netcoreapp1.1/Nirvana.dll \
     -c Data/Cache/24/GRCh37/Ensembl84 \
     --sd Data/SupplementaryDatabase/36/GRCh37 \
     -r Data/References/5/Homo_sapiens.GRCh37.Nirvana.dat \
     -i HiSeq.10000.vcf \
     -o HiSeq.10000.annotated

The error message in full looks like this:

---------------------------------------------------------------------------
Nirvana                                             (c) 2017 Illumina, Inc.
Stromberg, Roy, Lajugie, Jiang, and Li                                1.5.4
---------------------------------------------------------------------------

Running Nirvana on HiSeq.10000.vcf:

ERROR: Input string was not in a correct format.

Stack trace:
   at System.Number.ParseDouble(String value, NumberStyles options, NumberFormatInfo numfmt)
   at System.Convert.ToDouble(String value)
   at VariantAnnotation.DataStructures.VariantFeature.ParseInfoField(String infoField) in /Volumes/Cambridge/Reference/Nirvana/VariantAnnotation/DataStructures/VariantFeature.cs:line 426
   at VariantAnnotation.DataStructures.VariantFeature.ParseVcfLine(String[] vcfColumns) in /Volumes/Cambridge/Reference/Nirvana/VariantAnnotation/DataStructures/VariantFeature.cs:line 228
   at VariantAnnotation.AnnotationSources.NirvanaAnnotationSource.Annotate(IVariant variant) in /Volumes/Cambridge/Reference/Nirvana/VariantAnnotation/AnnotationSources/NirvanaAnnotationSource.cs:line 287
   at Nirvana.NirvanaAnnotator.ProgramExecution() in /Volumes/Cambridge/Reference/Nirvana/Nirvana/NirvanaAnnotator.cs:line 78
   at VariantAnnotation.CommandLine.AbstractCommandLineHandler.Execute(String[] args) in /Volumes/Cambridge/Reference/Nirvana/VariantAnnotation/CommandLine/AbstractCommandLineHandler.cs:line 292

VCF line:
chr1    109 .   A   T   0   FDRtranche2.00to10.00+  AC=1;AF=0.50;AN=2;DP=1019;Dels=0.00;HRun=0;HaplotypeScore=686.65;MQ=19.20;MQ0=288;OQ=2175.54;QD=2.13;SB=-1042.18    GT:AD:DP:GL:GQ  0/1:610,327:308:-316.30,-95.47,-803.03:99

Time: 00:00:02.1

Is this something you've encountered before or have I just done something stupid?

EDIT: It seems like it's working fine on Mutect2 output and also Manta calls except for deletions, I will look around and see if I can figure out if there is something with the VCF format.

yujiang02 commented 7 years ago

Hi, I tried to run Nirvana with the vcf line above, it looks fine. This error seems triggered by parsing SB value in info field, it is very likely because the tab between "SB=-1042.18" and "GT:AD:DP:GL:GQ" is not well set. Could you send me your input file such that I can try to reproduce this error?

viktorlj commented 7 years ago

Hi,

That sounds like a potential lead. The VCF file was downloaded using the suggested command in the setup script.

wget https://github.com/samtools/htsjdk/raw/master/src/test/resources/htsjdk/variant/HiSeq.10000.vcf

Attaching the file here as well. HiSeq.10000.vcf.zip

MichaelStromberg commented 7 years ago

Hi Viktor,

We manually downloaded the file as well and ran it with the file from https://github.com/samtools/htsjdk/raw/master/src/test/resources/htsjdk/variant/HiSeq.10000.vcf and Nirvana 1.5.4 worked as expected on our Windows and Linux machines.

E:\Data\Nirvana>dotnet d:\Benchmarking\Nirvana1.5.4\Nirvana.dll -c Cache\24\GRCh37\Ensembl84 -r References\5\Homo_sapiens.GRCh37.Nirvana.dat --sd SupplementaryDatabase\36\GRCh37 -i Data\Test\HiSeq.10000.vcf -o hiseq
---------------------------------------------------------------------------
Nirvana                                             (c) 2017 Illumina, Inc.
Stromberg, Roy, Lajugie, Jiang, and Li                                1.5.4
---------------------------------------------------------------------------

Running Nirvana on HiSeq.10000.vcf:
---------------------------------------------------------------------------
reference:  00:00:00.0                                                 chr1
cache & sa: 00:00:00.5
annotation: 00:00:00.9 (10,120 variants/s)

Peak memory usage: 517.7 MB
Time: 00:00:03.6

Tonight I'll try the same on my MacBook Pro.

MichaelStromberg commented 7 years ago

Hi Viktor,

I downloaded the TestNirvana.sh file on my MacBook Pro (running Mac OS X 10.11.6 - Apple doesn't support 10.12 on my ancient laptop) and ran it at home. After waiting a while for the files to download, everything worked as advertized:

Michael-Strombergs-MacBook-Pro:~ snownebula$ ./TestNirvana.sh 
Cloning into '/Users/snownebula/Nirvana'...
remote: Counting objects: 3829, done.
remote: Total 3829 (delta 0), reused 0 (delta 0), pack-reused 3829
Receiving objects: 100% (3829/3829), 81.87 MiB | 763.00 KiB/s, done.
Resolving deltas: 100% (1490/1490), done.
Checking connectivity... done.
Checking out files: 100% (2402/2402), done.
- starting cache download (v24 - 3.9 GB)
- starting supplementary annotation download (v36 - 11 GB)
- starting reference download (v5 - 1.3 GB)
- downloads completed.
- unpacking cache files... finished.
- unpacking reference files... finished.
- unpacking supplementary annotation files... finished.
~/Nirvana/Nirvana ~/Nirvana
  Restoring packages for /Users/snownebula/Nirvana/ErrorHandling/ErrorHandling.csproj...
  Restoring packages for /Users/snownebula/Nirvana/NDesk.Options/NDesk.Options.csproj...
  Generating MSBuild file /Users/snownebula/Nirvana/NDesk.Options/obj/NDesk.Options.csproj.nuget.g.props.
  Generating MSBuild file /Users/snownebula/Nirvana/ErrorHandling/obj/ErrorHandling.csproj.nuget.g.props.
  Generating MSBuild file /Users/snownebula/Nirvana/NDesk.Options/obj/NDesk.Options.csproj.nuget.g.targets.
  Generating MSBuild file /Users/snownebula/Nirvana/ErrorHandling/obj/ErrorHandling.csproj.nuget.g.targets.
  Writing lock file to disk. Path: /Users/snownebula/Nirvana/NDesk.Options/obj/project.assets.json
  Writing lock file to disk. Path: /Users/snownebula/Nirvana/ErrorHandling/obj/project.assets.json
  Restore completed in 5.87 sec for /Users/snownebula/Nirvana/ErrorHandling/ErrorHandling.csproj.
  Restoring packages for /Users/snownebula/Nirvana/Nirvana/Nirvana.csproj...
  Restore completed in 4.87 sec for /Users/snownebula/Nirvana/NDesk.Options/NDesk.Options.csproj.
  Restoring packages for /Users/snownebula/Nirvana/VariantAnnotation.Interface/VariantAnnotation.Interface.csproj...
  Generating MSBuild file /Users/snownebula/Nirvana/VariantAnnotation.Interface/obj/VariantAnnotation.Interface.csproj.nuget.g.props.
  Generating MSBuild file /Users/snownebula/Nirvana/VariantAnnotation.Interface/obj/VariantAnnotation.Interface.csproj.nuget.g.targets.
  Writing lock file to disk. Path: /Users/snownebula/Nirvana/VariantAnnotation.Interface/obj/project.assets.json
  Restore completed in 617.55 ms for /Users/snownebula/Nirvana/VariantAnnotation.Interface/VariantAnnotation.Interface.csproj.
  Restoring packages for /Users/snownebula/Nirvana/VariantAnnotation/VariantAnnotation.csproj...
  Generating MSBuild file /Users/snownebula/Nirvana/Nirvana/obj/Nirvana.csproj.nuget.g.props.
  Generating MSBuild file /Users/snownebula/Nirvana/Nirvana/obj/Nirvana.csproj.nuget.g.targets.
  Writing lock file to disk. Path: /Users/snownebula/Nirvana/Nirvana/obj/project.assets.json
  Restore completed in 678.76 ms for /Users/snownebula/Nirvana/Nirvana/Nirvana.csproj.
  Generating MSBuild file /Users/snownebula/Nirvana/VariantAnnotation/obj/VariantAnnotation.csproj.nuget.g.props.
  Generating MSBuild file /Users/snownebula/Nirvana/VariantAnnotation/obj/VariantAnnotation.csproj.nuget.g.targets.
  Writing lock file to disk. Path: /Users/snownebula/Nirvana/VariantAnnotation/obj/project.assets.json
  Restore completed in 608.13 ms for /Users/snownebula/Nirvana/VariantAnnotation/VariantAnnotation.csproj.

  NuGet Config files used:
      /Users/snownebula/.nuget/NuGet/NuGet.Config

  Feeds used:
      https://api.nuget.org/v3/index.json
Microsoft (R) Build Engine version 15.1.523.56541
Copyright (C) Microsoft Corporation. All rights reserved.

  ErrorHandling -> /Users/snownebula/Nirvana/bin/Release/netcoreapp1.1/ErrorHandling.dll
  NDesk.Options -> /Users/snownebula/Nirvana/bin/Release/netcoreapp1.1/NDesk.Options.dll
  VariantAnnotation.Interface -> /Users/snownebula/Nirvana/bin/Release/netcoreapp1.1/VariantAnnotation.Interface.dll
  VariantAnnotation -> /Users/snownebula/Nirvana/bin/Release/netcoreapp1.1/VariantAnnotation.dll
  Nirvana -> /Users/snownebula/Nirvana/bin/Release/netcoreapp1.1/Nirvana.dll
~/Nirvana
--2017-04-20 02:15:22--  https://github.com/samtools/htsjdk/raw/master/src/test/resources/htsjdk/variant/HiSeq.10000.vcf
Resolving github.com... 192.30.253.113, 192.30.253.112
Connecting to github.com|192.30.253.113|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/samtools/htsjdk/master/src/test/resources/htsjdk/variant/HiSeq.10000.vcf [following]
--2017-04-20 02:15:23--  https://raw.githubusercontent.com/samtools/htsjdk/master/src/test/resources/htsjdk/variant/HiSeq.10000.vcf
Resolving raw.githubusercontent.com... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...
Connecting to raw.githubusercontent.com|151.101.0.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2001116 (1.9M) [text/plain]
Saving to: ‘HiSeq.10000.vcf’

HiSeq.10000.vcf                 100%[=====================================================>]   1.91M  2.88MB/s    in 0.7s    

2017-04-20 02:15:25 (2.88 MB/s) - ‘HiSeq.10000.vcf’ saved [2001116/2001116]

---------------------------------------------------------------------------
Nirvana                                             (c) 2017 Illumina, Inc.
Stromberg, Roy, Lajugie, Jiang, and Li                                1.5.4
---------------------------------------------------------------------------

Running Nirvana on HiSeq.10000.vcf:
---------------------------------------------------------------------------
reference:  00:00:00.7                                                 chr1
cache & sa: 00:00:01.1
annotation: 00:00:01.5 (6,245 variants/s)

Time: 00:00:08.1
Michael-Strombergs-MacBook-Pro:~ snownebula$ 

Here is the MD5 checksum of the HiSeq.10000.vcf file that was downloaded:

Michael-Strombergs-MacBook-Pro:Nirvana snownebula$ md5 HiSeq.10000.vcf 
MD5 (HiSeq.10000.vcf) = 52447729145ec436480d407a1b4810d5

Is there anything else I can try to attempt to reproduce your situation (besides the feedback that Yu provided above)?

viktorlj commented 7 years ago

Hi Michael,

That's very strange but if you can't reproduce it even on your mac then it's most likely something that is off on my end. The MD5sum matches my VCF file. I tried reinstalling but I get the same result. As I don't have any error messages in the install I can't really come up with any environmental problems or other settings that could help you reproduce the issue right now.

Since it is irreproducible and Nirvana works fine on my actual output it's fine by me to close the issue. I'll let you know if I figure out what caused the problem.

MichaelStromberg commented 7 years ago

On our end, I will get the other developers to try this on their MacBook Pros as well. Perhaps we can observe the phenomenon in a different hardware/OS configuration and then isolate the underlying cause.