Closed sbilobram closed 5 years ago
it is kmer.fa You can download them from S3:
http://canvas-cnv-public.s3.amazonaws.com/
For example here is the one for hg19:
No I tried that. I have kmer.fa. They are all in my ./reference/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta folder. When I put -r to the kmer file I get the same error:
_-g ./reference/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta -f ./reference/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/filter13.bed -r /projects/sbilobram_prj/reference/Homosapiens/UCSC/hg19/Sequence/WholeGenomeFasta/kmer.fa
results in 2019-04-25T14:08:38-07:00,ERROR: Canvas workflow error: System.NullReferenceException: Object reference not set to an instance of an object. at Isas.SequencingFiles.ReferenceGenome.get_Species() at Isas.SequencingFiles.GenomeMetadata.Deserialize(TextReader reader, IDirectoryLocation genomeFastaFolder, IReferenceGenome referenceGenome) at Isas.SequencingFiles.GenomeMetadata.Deserialize(IFileLocation genomeSizeXml) at Canvas.TumorNormalWgsRunner.GetCallset() at Canvas.TumorNormalWgsRunner.Run(CanvasRunnerFactory runnerFactory) at Canvas.ModeLauncher.Launch()
OK, that does look correct. Can you try using absolute instead of relative paths?
Same error. Here is the exact command : _/gsc/software/linux-x86_64-centos7/dotnet-2.2.2/dotnet /projects/sbilobram_prj/bin/Canvas-1.40.0.1613+master_x64/Canvas.dll Somatic-WGS -b /projects/analysis/analysis26/P01829/merge_bwa-mem-0.7.6a/150nt/hg19a/P01829_2_lanes_dupsFlagged.bam --sample-b-allele-vcf /projects/rcorbettprj2/CNV_2016/Canvas/A36973_5_lanes_dupsFlagged.varFilter.PASS.vcf -n POG664_P01829 -o CANVAS -g /projects/sbilobram_prj/reference/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta -f /projects/sbilobram_prj/reference/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/filter13.bed -r /projects/sbilobram_prj/reference/Homosapiens/UCSC/hg19/Sequence/WholeGenomeFasta/hg19.fa
I only get an error when trying to put a value to -r switch. This is the error from above command. _ERROR: Canvas workflow error: System.NullReferenceException: Object reference not set to an instance of an object. at Isas.SequencingFiles.ReferenceGenome.getSpecies() at Isas.SequencingFiles.GenomeMetadata.Deserialize(TextReader reader, IDirectoryLocation genomeFastaFolder, IReferenceGenome referenceGenome) at Isas.SequencingFiles.GenomeMetadata.Deserialize(IFileLocation genomeSizeXml) at Canvas.TumorNormalWgsRunner.GetCallset() at Canvas.TumorNormalWgsRunner.Run(CanvasRunnerFactory runnerFactory) at Canvas.ModeLauncher.Launch() When I run with .NET1 I get a slightly different error that may shed light on the problem: Unhandled Exception: System.IO.FileLoadException: Could not load file or assembly 'System.Runtime, Version=4.2.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a'. The located assembly's manifest definition does not match the assembly reference. (Exception from HRESULT: 0x80131040) Aborted (core dumped)
I wasn't able to reproduce this. Can you try a clean output directory? The error may be getting cached from a previous failed run.
Okay some progress. Clearing out the output directory does help. The next issue was the chromosome notation for my BAM was WITHOUT 'chr' but not so for the /Sequence/WholeGenomeFasta/ files. I thin I corrected this but now get the following confusing error: ERROR: Canvas workflow error: System.Collections.Generic.KeyNotFoundException: The given key '/projects/analysis/analysis26/P01829/merge_bwa-mem-0.7.6a/150nt/hg19a/P01829_2_lanes_dupsFlagged.bam' was not present in the dictionary.
And then a little later : bash: /projects/sbilobram_prj/bin/Canvas-1.40.0.1613+master_x64/tabix: Permission denied
which is odd since I own that Canvas-1.40.0.1613+master_x64 directory
make sure you start with a fresh directory each time.
you can chmod +x /projects/sbilobram_prj/bin/Canvas-1.40.0.1613+master_x64/tabix
Yes Thanks . I am clearing the OUT dir each time. Making tabix executable worked but now Cavas is done in 0.5 seconds but I don't think I am getting what I am supposed to. There is an error at the top of the run which is ERROR: Canvas workflow error: System.Collections.Generic.KeyNotFoundException: The given key '/projects/analysis/analysis26/P01829/merge_bwa-mem-0.7.6a/150nt/hg19a/P01829_2_lanes_dupsFlagged.bam' was not present in the dictionary.
weird. Is that a line break in between /projects/analysis/analysis26/P01829/merge_bwa-mem- and 0.7.6a/150nt/hg19a/P01829_2_lanes_dupsFlagged.bam ?
Maybe the command line parsing is getting confused? Can you try a path without any hyphens?
Also check if the -n option matches the SM tag in the bam header. Is it POG664_P01829
Thanks for the suggestions but I am still getting the Dict. Key error. I did not know that the -n needs to be same as the SM tag but changing it here does not help. I tried a different bam without hyphens and match the -n parameter to the SM tag but still get the same error. Here is the command ( I tried with absolute path with no difference, I clear the CANVAS_OUT dir. each time):
/gsc/software/linux-x86_64-centos7/dotnet-2.2.2/dotnet /projects/sbilobram_prj/bin/Canvas-1.40.0.1613+master_x64/Canvas.dll Somatic-WGS -b ./HybridHuman/DATA/RNAseq_HSJ_031_Tumour.bam --sample-b-allele-vcf ./CANVAS/dbsnp.vcf -n HSJ031_Tumor -o CANVAS_OUT -g ./reference/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta -f ./reference/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/filter13.1.bed -r ./reference/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/kmer1.fa
If it helps the beginning of the log is here up till the first error :
2019-04-26T14:48:30-07:00,
2019-04-26T14:48:30-07:00,Running checkpoint 01: Validate input
2019-04-26T14:48:31-07:00,Running Canvas Somatic-WGS 1.40.0.1613+master
2019-04-26T14:48:31-07:00,Command-line arguments: Somatic-WGS -b ./HybridHuman/DATA/RNAseq_HSJ_031_Tumour.bam --sample-b-allele-vcf ./CANVAS/dbsnp.vcf -n HSJ031_Tumor -o CANVAS_OUT -g ./reference/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta -f ./reference/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/filter13.1.bed -r ./reference/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/kmer1.fa
2019-04-26T14:48:31-07:00,Checkpoint 01 Validate input complete. Elapsed time (hh/mm/ss): 00:00:00.3
2019-04-26T14:48:31-07:00,Normal Vcf path: /projects/sbilobram_prj/CANVAS/dbsnp.vcf
2019-04-26T14:48:31-07:00,
2019-04-26T14:48:31-07:00,Running checkpoint 02: CanvasSNV
2019-04-26T14:48:31-07:00,
2019-04-26T14:48:31-07:00,Running checkpoint 03: CanvasBin
Invoking 0 processor jobs...for sample HSJ031_Tumor
CanvasSNV start for sample HSJ031_Tumor
CanvasSNV complete for sample HSJ031_Tumor
2019-04-26T14:48:31-07:00,Begin converting '/projects/sbilobram_prj/CANVAS_OUT/TempCNV_HSJ031_Tumor/VFResultsHSJ031_Tumor.txt.gz.baf' to '/projects/sbilobram_prj/CANVAS_OUT/TempCNV_HSJ031_Tumor/ballele.bedgraph.gz'
2019-04-26T14:48:31-07:00,ERROR: Canvas workflow error: System.Collections.Generic.KeyNotFoundException: The given key '/projects/sbilobram_prj/HybridHuman/DATA/RNAseq_HSJ_031_Tumour.bam' was not present in the dictionary.
at System.Collections.Generic.Dictionary`2.get_Item(TKey key)
at Canvas.CanvasRunner.<>c__DisplayClass23_0.
Does the CANVAS/dbsnp.vcf file contain the sample name or is that just a population VCF?
For population vcf you can use option --population-b-allele-vcf although the results will not be as good when compared to an analysis using a matched normal sample SNV vcf.
I just downloaded dbsnp so I switced to --population option. Got the same error. When I soft link to any bam I get the same workflow error. The workflow does not get as far as reading the -b BAM I think. Here I put in an almost empty file called testEmpty.bam. And still get the same error:
cat testEmpty.bam empty
>$ /gsc/software/linux-x86_64-centos7/dotnet-2.2.2/dotnet /projects/sbilobram_prj/bin/Canvas-1.40.0.1613+master_x64/Canvas.dll Somatic-WGS -b testEmpty.bam --population-b-allele-vcf ./CANVAS/dbsnp.vcf -n N/A -o CANVAS_OUT -g ./reference/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta -f ./reference/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/filter13.1.bed -r ./reference/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/kmer1.fa
2019-04-26T15:34:58-07:00, 2019-04-26T15:34:58-07:00,Running checkpoint 01: Validate input 2019-04-26T15:34:59-07:00,Running Canvas Somatic-WGS 1.40.0.1613+master 2019-04-26T15:34:59-07:00,Command-line arguments: Somatic-WGS -b testEmpty.bam --population-b-allele-vcf ./CANVAS/dbsnp.vcf -n N/A -o CANVAS_OUT -g ./reference/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta -f ./reference/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/filter13.1.bed -r ./reference/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/kmer1.fa 2019-04-26T15:34:59-07:00,Checkpoint 01 Validate input complete. Elapsed time (hh/mm/ss): 00:00:00.3 2019-04-26T15:34:59-07:00,Normal Vcf path: /projects/sbilobram_prj/CANVAS/dbsnp.vcf 2019-04-26T15:34:59-07:00, 2019-04-26T15:34:59-07:00,Running checkpoint 02: CanvasSNV 2019-04-26T15:34:59-07:00, 2019-04-26T15:34:59-07:00,Running checkpoint 03: CanvasBin Invoking 0 processor jobs...for sample N/A CanvasSNV start for sample N/A CanvasSNV complete for sample N/A 2019-04-26T15:34:59-07:00,ERROR: Canvas workflow error: System.Collections.Generic.KeyNotFoundException: The given key '/projects/sbilobram_prj/testEmpty.bam' was not present in the dictionary. at System.Collections.Generic.Dictionary`2.get_Item(TKey key)
Can you check the chromosome names are matching in all the files? What is concerning is this log line:
Invoking 0 processor jobs...for sample N/A
There should be 24 jobs (one per chromosome). Somehow it is not seeing the correct chromosomes.
Does your GenomeSize.xml have the correct type="Autosome" or type="Allosome" for the chromosmes?
http://canvas-cnv-public.s3.amazonaws.com/hg19/WholeGenomeFasta/GenomeSize.xml
Thanks for staying supportive. I think we got it!! Just so you know what has been happening, in my hurry to fix all the ref. files from 'chr##' to just '##' I just ripped out all the 'chr' in all the WholeGenomeFasta files. The xml file did have the chr## changed to ## but also the word 'chromosome' in the file was changed to 'omosome' . I fixed this file and looks like the jobs are running now. In the future the way to go would be to instead change the BAM column3 entries '##' to 'chr##' Anyways my problem is resolved and my CANVAS jobs are now running.
Good to hear. Amazing that there was no error while parsing the mangled GenomeSize.xml file. That should be fixed.
trying to run Canvas CNV but unsure what a Canvas -ready reference file is -r, --reference=VALUE Canvas-ready reference fasta file (required) Without a reference (NO -r supplied) I get: Error: reference is a required option But when I supply a reference Fasta as in .... _-g ./reference/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta -f ./reference/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/filter13.bed -r /projects/sbilobram_prj/reference/Homosapiens/UCSC/hg19/Sequence/WholeGenomeFasta/hg19.fa
I get the following error: 2019-04-25T12:40:25-07:00,ERROR: Canvas workflow error: System.NullReferenceException: Object reference not set to an instance of an object. at Isas.SequencingFiles.ReferenceGenome.get_Species() at Isas.SequencingFiles.GenomeMetadata.Deserialize(TextReader reader, IDirectoryLocation genomeFastaFolder, IReferenceGenome referenceGenome) at Isas.SequencingFiles.GenomeMetadata.Deserialize(IFileLocation genomeSizeXml) at Canvas.TumorNormalWgsRunner.GetCallset() at Canvas.TumorNormalWgsRunner.Run(CanvasRunnerFactory runnerFactory) at Canvas.ModeLauncher.Launch()