Illumina / canvas

Canvas - Copy number variant (CNV) calling from DNA sequencing data
Other
121 stars 20 forks source link

Canvas cnv reference problems ? #120

Closed sbilobram closed 5 years ago

sbilobram commented 5 years ago

trying to run Canvas CNV but unsure what a Canvas -ready reference file is -r, --reference=VALUE Canvas-ready reference fasta file (required) Without a reference (NO -r supplied) I get: Error: reference is a required option But when I supply a reference Fasta as in .... _-g ./reference/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta -f ./reference/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/filter13.bed -r /projects/sbilobram_prj/reference/Homosapiens/UCSC/hg19/Sequence/WholeGenomeFasta/hg19.fa

I get the following error: 2019-04-25T12:40:25-07:00,ERROR: Canvas workflow error: System.NullReferenceException: Object reference not set to an instance of an object. at Isas.SequencingFiles.ReferenceGenome.get_Species() at Isas.SequencingFiles.GenomeMetadata.Deserialize(TextReader reader, IDirectoryLocation genomeFastaFolder, IReferenceGenome referenceGenome) at Isas.SequencingFiles.GenomeMetadata.Deserialize(IFileLocation genomeSizeXml) at Canvas.TumorNormalWgsRunner.GetCallset() at Canvas.TumorNormalWgsRunner.Run(CanvasRunnerFactory runnerFactory) at Canvas.ModeLauncher.Launch()

eroller commented 5 years ago

it is kmer.fa You can download them from S3:

http://canvas-cnv-public.s3.amazonaws.com/

For example here is the one for hg19:

http://canvas-cnv-public.s3.amazonaws.com/hg19/kmer.fa

sbilobram commented 5 years ago

No I tried that. I have kmer.fa. They are all in my ./reference/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta folder. When I put -r to the kmer file I get the same error:

_-g ./reference/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta -f ./reference/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/filter13.bed -r /projects/sbilobram_prj/reference/Homosapiens/UCSC/hg19/Sequence/WholeGenomeFasta/kmer.fa

results in 2019-04-25T14:08:38-07:00,ERROR: Canvas workflow error: System.NullReferenceException: Object reference not set to an instance of an object. at Isas.SequencingFiles.ReferenceGenome.get_Species() at Isas.SequencingFiles.GenomeMetadata.Deserialize(TextReader reader, IDirectoryLocation genomeFastaFolder, IReferenceGenome referenceGenome) at Isas.SequencingFiles.GenomeMetadata.Deserialize(IFileLocation genomeSizeXml) at Canvas.TumorNormalWgsRunner.GetCallset() at Canvas.TumorNormalWgsRunner.Run(CanvasRunnerFactory runnerFactory) at Canvas.ModeLauncher.Launch()

eroller commented 5 years ago

OK, that does look correct. Can you try using absolute instead of relative paths?

sbilobram commented 5 years ago

Same error. Here is the exact command : _/gsc/software/linux-x86_64-centos7/dotnet-2.2.2/dotnet /projects/sbilobram_prj/bin/Canvas-1.40.0.1613+master_x64/Canvas.dll Somatic-WGS -b /projects/analysis/analysis26/P01829/merge_bwa-mem-0.7.6a/150nt/hg19a/P01829_2_lanes_dupsFlagged.bam --sample-b-allele-vcf /projects/rcorbettprj2/CNV_2016/Canvas/A36973_5_lanes_dupsFlagged.varFilter.PASS.vcf -n POG664_P01829 -o CANVAS -g /projects/sbilobram_prj/reference/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta -f /projects/sbilobram_prj/reference/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/filter13.bed -r /projects/sbilobram_prj/reference/Homosapiens/UCSC/hg19/Sequence/WholeGenomeFasta/hg19.fa

I only get an error when trying to put a value to -r switch. This is the error from above command. _ERROR: Canvas workflow error: System.NullReferenceException: Object reference not set to an instance of an object. at Isas.SequencingFiles.ReferenceGenome.getSpecies() at Isas.SequencingFiles.GenomeMetadata.Deserialize(TextReader reader, IDirectoryLocation genomeFastaFolder, IReferenceGenome referenceGenome) at Isas.SequencingFiles.GenomeMetadata.Deserialize(IFileLocation genomeSizeXml) at Canvas.TumorNormalWgsRunner.GetCallset() at Canvas.TumorNormalWgsRunner.Run(CanvasRunnerFactory runnerFactory) at Canvas.ModeLauncher.Launch() When I run with .NET1 I get a slightly different error that may shed light on the problem: Unhandled Exception: System.IO.FileLoadException: Could not load file or assembly 'System.Runtime, Version=4.2.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a'. The located assembly's manifest definition does not match the assembly reference. (Exception from HRESULT: 0x80131040) Aborted (core dumped)

eroller commented 5 years ago

I wasn't able to reproduce this. Can you try a clean output directory? The error may be getting cached from a previous failed run.

sbilobram commented 5 years ago

Okay some progress. Clearing out the output directory does help. The next issue was the chromosome notation for my BAM was WITHOUT 'chr' but not so for the /Sequence/WholeGenomeFasta/ files. I thin I corrected this but now get the following confusing error: ERROR: Canvas workflow error: System.Collections.Generic.KeyNotFoundException: The given key '/projects/analysis/analysis26/P01829/merge_bwa-mem-0.7.6a/150nt/hg19a/P01829_2_lanes_dupsFlagged.bam' was not present in the dictionary.

And then a little later : bash: /projects/sbilobram_prj/bin/Canvas-1.40.0.1613+master_x64/tabix: Permission denied

which is odd since I own that Canvas-1.40.0.1613+master_x64 directory

eroller commented 5 years ago

make sure you start with a fresh directory each time.

you can chmod +x /projects/sbilobram_prj/bin/Canvas-1.40.0.1613+master_x64/tabix

sbilobram commented 5 years ago

Yes Thanks . I am clearing the OUT dir each time. Making tabix executable worked but now Cavas is done in 0.5 seconds but I don't think I am getting what I am supposed to. There is an error at the top of the run which is ERROR: Canvas workflow error: System.Collections.Generic.KeyNotFoundException: The given key '/projects/analysis/analysis26/P01829/merge_bwa-mem-0.7.6a/150nt/hg19a/P01829_2_lanes_dupsFlagged.bam' was not present in the dictionary.

eroller commented 5 years ago

weird. Is that a line break in between /projects/analysis/analysis26/P01829/merge_bwa-mem- and 0.7.6a/150nt/hg19a/P01829_2_lanes_dupsFlagged.bam ?

Maybe the command line parsing is getting confused? Can you try a path without any hyphens?

eroller commented 5 years ago

Also check if the -n option matches the SM tag in the bam header. Is it POG664_P01829

sbilobram commented 5 years ago

Thanks for the suggestions but I am still getting the Dict. Key error. I did not know that the -n needs to be same as the SM tag but changing it here does not help. I tried a different bam without hyphens and match the -n parameter to the SM tag but still get the same error. Here is the command ( I tried with absolute path with no difference, I clear the CANVAS_OUT dir. each time):

/gsc/software/linux-x86_64-centos7/dotnet-2.2.2/dotnet /projects/sbilobram_prj/bin/Canvas-1.40.0.1613+master_x64/Canvas.dll Somatic-WGS -b ./HybridHuman/DATA/RNAseq_HSJ_031_Tumour.bam --sample-b-allele-vcf ./CANVAS/dbsnp.vcf -n HSJ031_Tumor -o CANVAS_OUT -g ./reference/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta -f ./reference/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/filter13.1.bed -r ./reference/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/kmer1.fa

If it helps the beginning of the log is here up till the first error :

2019-04-26T14:48:30-07:00, 2019-04-26T14:48:30-07:00,Running checkpoint 01: Validate input 2019-04-26T14:48:31-07:00,Running Canvas Somatic-WGS 1.40.0.1613+master 2019-04-26T14:48:31-07:00,Command-line arguments: Somatic-WGS -b ./HybridHuman/DATA/RNAseq_HSJ_031_Tumour.bam --sample-b-allele-vcf ./CANVAS/dbsnp.vcf -n HSJ031_Tumor -o CANVAS_OUT -g ./reference/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta -f ./reference/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/filter13.1.bed -r ./reference/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/kmer1.fa 2019-04-26T14:48:31-07:00,Checkpoint 01 Validate input complete. Elapsed time (hh/mm/ss): 00:00:00.3 2019-04-26T14:48:31-07:00,Normal Vcf path: /projects/sbilobram_prj/CANVAS/dbsnp.vcf 2019-04-26T14:48:31-07:00, 2019-04-26T14:48:31-07:00,Running checkpoint 02: CanvasSNV 2019-04-26T14:48:31-07:00, 2019-04-26T14:48:31-07:00,Running checkpoint 03: CanvasBin Invoking 0 processor jobs...for sample HSJ031_Tumor CanvasSNV start for sample HSJ031_Tumor CanvasSNV complete for sample HSJ031_Tumor 2019-04-26T14:48:31-07:00,Begin converting '/projects/sbilobram_prj/CANVAS_OUT/TempCNV_HSJ031_Tumor/VFResultsHSJ031_Tumor.txt.gz.baf' to '/projects/sbilobram_prj/CANVAS_OUT/TempCNV_HSJ031_Tumor/ballele.bedgraph.gz' 2019-04-26T14:48:31-07:00,ERROR: Canvas workflow error: System.Collections.Generic.KeyNotFoundException: The given key '/projects/sbilobram_prj/HybridHuman/DATA/RNAseq_HSJ_031_Tumour.bam' was not present in the dictionary. at System.Collections.Generic.Dictionary`2.get_Item(TKey key) at Canvas.CanvasRunner.<>c__DisplayClass23_0.b__0(Int32 bamIdx, WorkResources resources, IJobLauncher jobLauncher)

eroller commented 5 years ago

Does the CANVAS/dbsnp.vcf file contain the sample name or is that just a population VCF?

For population vcf you can use option --population-b-allele-vcf although the results will not be as good when compared to an analysis using a matched normal sample SNV vcf.

sbilobram commented 5 years ago

I just downloaded dbsnp so I switced to --population option. Got the same error. When I soft link to any bam I get the same workflow error. The workflow does not get as far as reading the -b BAM I think. Here I put in an almost empty file called testEmpty.bam. And still get the same error:

cat testEmpty.bam empty

>$ /gsc/software/linux-x86_64-centos7/dotnet-2.2.2/dotnet /projects/sbilobram_prj/bin/Canvas-1.40.0.1613+master_x64/Canvas.dll Somatic-WGS -b testEmpty.bam --population-b-allele-vcf ./CANVAS/dbsnp.vcf -n N/A -o CANVAS_OUT -g ./reference/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta -f ./reference/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/filter13.1.bed -r ./reference/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/kmer1.fa

2019-04-26T15:34:58-07:00, 2019-04-26T15:34:58-07:00,Running checkpoint 01: Validate input 2019-04-26T15:34:59-07:00,Running Canvas Somatic-WGS 1.40.0.1613+master 2019-04-26T15:34:59-07:00,Command-line arguments: Somatic-WGS -b testEmpty.bam --population-b-allele-vcf ./CANVAS/dbsnp.vcf -n N/A -o CANVAS_OUT -g ./reference/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta -f ./reference/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/filter13.1.bed -r ./reference/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/kmer1.fa 2019-04-26T15:34:59-07:00,Checkpoint 01 Validate input complete. Elapsed time (hh/mm/ss): 00:00:00.3 2019-04-26T15:34:59-07:00,Normal Vcf path: /projects/sbilobram_prj/CANVAS/dbsnp.vcf 2019-04-26T15:34:59-07:00, 2019-04-26T15:34:59-07:00,Running checkpoint 02: CanvasSNV 2019-04-26T15:34:59-07:00, 2019-04-26T15:34:59-07:00,Running checkpoint 03: CanvasBin Invoking 0 processor jobs...for sample N/A CanvasSNV start for sample N/A CanvasSNV complete for sample N/A 2019-04-26T15:34:59-07:00,ERROR: Canvas workflow error: System.Collections.Generic.KeyNotFoundException: The given key '/projects/sbilobram_prj/testEmpty.bam' was not present in the dictionary. at System.Collections.Generic.Dictionary`2.get_Item(TKey key)

eroller commented 5 years ago

Can you check the chromosome names are matching in all the files? What is concerning is this log line:

Invoking 0 processor jobs...for sample N/A

There should be 24 jobs (one per chromosome). Somehow it is not seeing the correct chromosomes.

eroller commented 5 years ago

Does your GenomeSize.xml have the correct type="Autosome" or type="Allosome" for the chromosmes?

http://canvas-cnv-public.s3.amazonaws.com/hg19/WholeGenomeFasta/GenomeSize.xml

sbilobram commented 5 years ago

Thanks for staying supportive. I think we got it!! Just so you know what has been happening, in my hurry to fix all the ref. files from 'chr##' to just '##' I just ripped out all the 'chr' in all the WholeGenomeFasta files. The xml file did have the chr## changed to ## but also the word 'chromosome' in the file was changed to 'omosome' . I fixed this file and looks like the jobs are running now. In the future the way to go would be to instead change the BAM column3 entries '##' to 'chr##' Anyways my problem is resolved and my CANVAS jobs are now running.

eroller commented 5 years ago

Good to hear. Amazing that there was no error while parsing the mangled GenomeSize.xml file. That should be fixed.