Illumina / canvas

Canvas - Copy number variant (CNV) calling from DNA sequencing data
Other
121 stars 20 forks source link

NullReferenceException when running Germline-WGS #69

Open fbattke opened 7 years ago

fbattke commented 7 years ago

I am running Canvas in a docker container using the hg19 reference files from the S3 link.

I am trying to call Canvas using variants called by strelka/starling on a shallow WGS (7 fold) dataset:

root@711ba034ea7e:/reference# dotnet /opt/Canvas/Canvas.dll Germline-WGS -r /reference/kmer.fa -g /reference/ -f /reference/filter13.bed -b /data/test_cov7.bam -n testsample --sample-b-allele-vcf=/data/test_cov7.variants.vcf.gz -o /data/canvas.result

However, I get a NullReferenceException relating to the reference:

2017-11-20T17:26:39,Running checkpoint 01: Validate input 2017-11-20T17:26:40,Saved checkpoint results to /localcanvas.result/Checkpoints/progress.json 2017-11-20T17:26:40,Running Canvas Germline-WGS 1.30.0.725+master 2017-11-20T17:26:40,ERROR: Canvas workflow error: System.NullReferenceException: Object reference not set to an instance of an object. at Isas.SequencingFiles.ReferenceGenome.get_Build() at Isas.SequencingFiles.GenomeMetadata.Deserialize(TextReader reader, IDirectoryLocation genomeFastaFolder, IReferenceGenome referenceGenome) at Isas.SequencingFiles.GenomeMetadata.Deserialize(IFileLocation genomeSizeXml) at Canvas.GermlineWgsRunner.GetCallset() at Canvas.GermlineWgsRunner.Run(ILogger logger, ICheckpointRunner checkpointRunner, IWorkManager workManager, IFileLocation runtimeExecutable) at Canvas.ModeLauncher.Launch() 2017-11-20T17:26:40,Command-line arguments: Germline-WGS -r /reference/kmer.fa -g /reference/ -f /reference/filter13.bed -b /data/test_cov7.bam -n testsample --sample-b-allele-vcf=/data/test_cov7.variants.vcf.gz -o /data/canvas.result 2017-11-20T17:26:40,Saved checkpoint results to /local/canvas.result/Checkpoints/01-Validateinput.json 2017-11-20T17:26:40,Elapsed time (step/time(sec)/name) 01 00:00:00.6 Validate input 2017-11-20T17:26:40,Total execution time: 00:00:00.6

The reference folder is as follows: root@711ba034ea7e:/reference# ls -lh total 5.9G -rw-r--r-- 1 root root 4.6K Aug 28 16:03 GenomeSize.xml -rw-r--r-- 1 root root 11K Aug 28 15:48 filter13.bed -rw-r--r-- 1 root root 3.0G Aug 28 16:03 genome.fa -rw-r--r-- 1 root root 783 Aug 28 16:03 genome.fa.fai -rw-r--r-- 1 root root 3.0G Aug 28 15:48 kmer.fa -rw-r--r-- 1 root root 783 Aug 28 16:03 kmer.fa.fai

Here's the Dockerfile contents, in case you want to provide a docker image (I saw the request on another issue)

FROM ubuntu:16.04

dependency: mono

RUN apt-get update RUN apt-get -y install mono-runtime mono-complete wget curl apt-transport-https pigz

dependency: dotnet

RUN sh -c 'echo "deb [arch=amd64] https://apt-mo.trafficmanager.net/repos/dotnet-release/ xenial main" > /etc/apt/sources.list.d/dotnetdev.list' RUN apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys 417A0893 RUN apt-get update RUN apt-get install -y dotnet-dev-1.0.4

download canvas

RUN cd opt && wget https://github.com/Illumina/canvas/releases/download/1.30.0.725%2Bmaster/Canvas-1.30.0.725.master_x64.tar.gz RUN cd opt && tar xzvf Canvas-1.30.0.725.master_x64.tar.gz RUN ln -s /opt/Canvas-1.30.0.725+master_x64 /opt/Canvas

download hg19 ref files

RUN mkdir /reference RUN cd /reference && wget http://canvas-cnv-public.s3.amazonaws.com/hg19/WholeGenomeFasta/GenomeSize.xml RUN cd /reference && wget http://canvas-cnv-public.s3.amazonaws.com/hg19/filter13.bed RUN cd /reference && wget http://canvas-cnv-public.s3.amazonaws.com/hg19/kmer.fa RUN cd /reference && wget http://canvas-cnv-public.s3.amazonaws.com/hg19/kmer.fa.fai RUN cd /reference && wget http://canvas-cnv-public.s3.amazonaws.com/hg19/WholeGenomeFasta/genome.fa RUN cd /reference && wget http://canvas-cnv-public.s3.amazonaws.com/hg19/WholeGenomeFasta/genome.fa.fai RUN apt-get clean RUN export DOTNET_CLI_TELEMETRY_OPTOUT=1

ENV PATH="/opt/Canvas/:$PATH" ENTRYPOINT ["dotnet","/opt/Canvas/Canvas.dll","Germline-WGS","-r","/reference/kmer.fa","-g","/reference/","-f","/reference/filter13.bed"]

eroller commented 7 years ago

Sorry you are running into this issue, but try staging the WholeGenomeFasta directory with the following hierarchy of directories:

/reference/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa /reference/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fai

We have hard coded assumptions about the structure of the reference genome directory which really should be relaxed.

fbattke commented 7 years ago

Thank you Eric,

that solved it. It would be helpful to mention this in the README.md file.

Florian


Von: Eric Roller notifications@github.com Gesendet: Montag, 20. November 2017 19:20 An: Illumina/canvas Cc: Florian Battke; Author Betreff: Re: [Illumina/canvas] NullReferenceException when running Germline-WGS (#69)

Sorry you are running into this issue, but try staging the WholeGenomeFasta directory with the following hierarchy of directories:

/reference/Homo_sapiens/NCBI/hg19/Sequence/WholeGenomeFasta/genome.fa /reference/Homo_sapiens/NCBI/hg19/Sequence/WholeGenomeFasta/genome.fai

We have hard coded assumptions about the structure of the reference genome directory which really should be relaxed.

- You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/Illumina/canvas/issues/69#issuecomment-345782818, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ACd913e2WSxbe-5gyhFkjIG7ZHsIMAH8ks5s4cLSgaJpZM4Qkn-8.

rjaksik commented 6 years ago

Hi, I am having the same problem in 1.38.0.1554 under Fedora, however this solution doesnt work for me.

2018-06-25T10:27:24+02:00,Running checkpoint 01: Validate input 2018-06-25T10:27:25+02:00,Running Canvas Germline-WGS 1.38.0.1554+master 2018-06-25T10:27:25+02:00,Command-line arguments: Germline-WGS --reference=/library/GENOMES/GRCh37_canvas/kmer.fa -g /library/GENOMES/GRCh37_canvas -f /library/GENOMES/GRCh37_canvas/filter13.bed --custom-parameters=CanvasBin,-m=TruncatedDynamicRange -b Tumor.dedup.recal.bam --sample-b-allele-vcf=Tumor_HaplotypeCallerPASS.vcf -n Tumor -o Tumor_CNV 2018-06-25T10:27:25+02:00,Checkpoint 01 Validate input complete. Elapsed time (hh/mm/ss): 00:00:00.2 2018-06-25T10:27:25+02:00,ERROR: Canvas workflow error: System.NullReferenceException: Object reference not set to an instance of an object. at Isas.SequencingFiles.ReferenceGenome.get_Species() at Isas.SequencingFiles.GenomeMetadata.Deserialize(TextReader reader, IDirectoryLocation genomeFastaFolder, IReferenceGenome referenceGenome) at Isas.SequencingFiles.GenomeMetadata.Deserialize(IFileLocation genomeSizeXml) at Canvas.GermlineWgsRunner.GetCallset() at Canvas.GermlineWgsRunner.Run(CanvasRunnerFactory runnerFactory) at Canvas.ModeLauncher.Launch()

This is the structure of my genome folder:

/library/GENOMES/GRCh37_canvas: dbsnp.vcf filter13.bed genome.fa genome.fa.fai GenomeSize.xml kmer.fa kmer.fa.fai /library/GENOMES/GRCh37_canvas/Homo_sapiens/NCBI/hg19/Sequence/WholeGenomeFasta: genome.fa genome.fa.fai GenomeSize.xml /library/GENOMES/GRCh37_canvas/Sequence/WholeGenomeFasta: genome.fa genome.fa.fai GenomeSize.xml /library/GENOMES/GRCh37_canvas/WholeGenomeFasta: genome.fa genome.fa.fai GenomeSize.xml

I use symbolic links in the additional subdirectories. The parameters should be ok since I can run the same analysis in version 1.11.0 without any problems. Do you have any suggestions? I will be very grateful for your help.

eroller commented 6 years ago

please use the full hierarchy of directories including species/provider/build:

Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta

rjaksik commented 6 years ago

This is what I am using, my full path is: /library/GENOMES/GRCh37_canvas/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta which contains: genome.fa genome.fa.fai GenomeSize.xml referenced as -g /library/GENOMES/GRCh37_canvas

In the previous post I pasted the wrong one with NCBI inside, instead of UCSC

eroller commented 6 years ago

try -g /library/GENOMES/GRCh37_canvas/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta

eroller commented 6 years ago

By the way, GRCh37 has different chromosome names than UCSC hg19 so that path is a little confusing

rjaksik commented 6 years ago

Thank you that did it. So to summarize -g /library/GENOMES/GRCh37_canvas/ didn’t work, unlike: -g /library/GENOMES/GRCh37_canvas/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta despite the fact that both of them contain the same files (the longer one contains symbolic links to the files in the short one).

You are right, the path is very confusing. It would be great if you could adress this issue in the next release.

rjaksik commented 6 years ago

It actually crashed at CanvasPartition due to: "Unhandled Exception: Illumina.Common.OptionException: Missing required value for option '-p'" which apears to be the optional ploidity parameter that I did not specify.

eroller commented 6 years ago

please try running Canvas on a fresh output directory. I think the ploidy option from your previous run is being cached.

rjaksik commented 6 years ago

Unfortunatly thats not it, I removed completly the output folder (Tumor_CNV), also, I never used the ploidity parameter. My entire command is:

canvas Germline-WGS --reference=/library/GENOMES/GRCh37_canvas/kmer.fa -g /library/GENOMES/GRCh37_canvas/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta -f /library/GENOMES/GRCh37_canvas/filter13.bed --custom-parameters=CanvasBin,-m=TruncatedDynamicRange -b Tumor.dedup.recal.bam --sample-b-allele-vcf=Tumor_HaplotypeCallerPASS.vcf -n Tumor -o Tumor_CNV

eroller commented 6 years ago

please try the SmallPedigree-WGS mode even for a single sample. Germline-WGS has been deprecated and will be removed in future versions. You will also need to specify the ploidy argument on the command line. See https://github.com/Illumina/canvas/issues/89 for details.