Illumina / canvas

Canvas - Copy number variant (CNV) calling from DNA sequencing data
Other
121 stars 20 forks source link

Is the SmallPedigree-WGS README.md example out of date #107

Open cvlvxi opened 5 years ago

cvlvxi commented 5 years ago

The example in the README.md

dotnet /CanvasDIR/Canvas.dll SmallPedigree-WGS --bam=/basespace/Projects/canvas/AppResults/bams/Files/father.bam --bam=/basespace/Projects/canvas/AppResults/bams/Files/mother.bam --bam=/basespace/Projects/canvas/AppResults/bams/Files/child1.bam --mother=mother --father=father --proband=child1 -r /basespace/Projects/canvas/AppResults/canvasdata/Files/kmer.fa -g /basespace/Projects/canvas/AppResults/canvasdata/Files/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta --sample-b-allele-vcf /basespace/Projects/canvas/AppResults/snvvcf/Files/Pedigree.vcf.gz -f /basespace/Projects/canvas/AppResults/canvasdata/Files/filter13.bed -o /tmp/gHapMixDemo --ploidy-vcf="/basespace/Projects/canvas/AppResults/snvvcf/Files/MultiSamplePloidy.vcf"

Seems out of date from the --help:

Mode-specific options: -b, --bam=VALUE1 VALUE2 VALUE3 bam [pedigree-member] [sample-name] Option can be specified multiple times. (required)

                           bam: sample .bam file (required)

                           pedigree-member: Pedigree member type (either
                           proband, mother, father or other). Default is
                           other

Also -- is proband even an option? I've been using it like so:

dotnet /misc/vcgs/exome/cpipe-2.3-research/tools/canvas/1.39.0.1598/Canvas.dll SmallPedigree-WGS --bam=/path/to/FATHERID.bam father FATHERID --bam=/path/to/MOTHERID.bam mother MOTHERID --bam=/path/to/PROBANDID.bam proband PROBANDID.......(+otherargs)

but im getting

2018-11-24T16:36:08+11:00,Running checkpoint 01: Validate input 2018-11-24T16:36:08+11:00,ERROR: Error: found unexpected arguments '--proband=PROBANDID'

I was checking the source code in ModeParserTests.cs and couldn't see anything related to setting bamfile.FullName, "proband", "SampleID"..

What am I doing wrong?

cvlvxi commented 5 years ago

I managed to get this to run by not specify any of the relationships next to the bam args and just the bam file i.e. -b /path/to/mother.bam -b /path/to/father.bam -b/path/to/proband.bam

Will it infer the relationship from the multi sample vcf?

cvlvxi commented 5 years ago

Changing the issue to when running SmallPedigree-WGS I'm running into this error:

Job error message:
2018-11-27T12:23:43+11:00,ERROR: Exception caught in WorkDoerFactory. Cancelling all jobs. Exception:
    Cannot calculate median of an empty SortedList.
System.Exception: Cannot calculate median of an empty SortedList.
   at Illumina.Common.SortedListExtensions.Median[T](SortedList`1 list, Func`3 average)
   at CanvasPedigreeCaller.SampleMetrics.GetSampleInfo(IReadOnlyList`1 segments, String ploidyBedPath, Int32 numberOfTrimmedBins, SampleId id) in D:\TeamCity\buildAgent\work\a29a190a11771d97\Src\Canvas\CanvasPedigreeCaller\SampleMetrics.cs:line 38
   at CanvasPedigreeCaller.CanvasPedigreeCaller.CallVariants(List`1 variantFrequencyFiles, List`1 segmentFiles, IFileLocation outVcfFile, String ploidyBedPath, String referenceFolder, List`1 sampleNames, String commonCnvsBedPath, List`1 sampleTypes) in D:\TeamCity\buildAgent\work\a29a190a11771d97\Src\Canvas\CanvasPedigreeCaller\CanvasPedigreeCaller.cs:line 93
eroller commented 5 years ago

The --bam argument takes three values for example -b=/path/to/my.bam,mother,MotherID the default sample type is "OTHER" and the default sample ID is the ID specified in the SM tag of the bam

cvlvxi commented 5 years ago

Hi @eroller,

I ended up running the the SmallPedigree-WGS with and without specifying the bam familial relationship and id, in both cases I'm getting the error:

Job error message:
2018-11-28T10:48:44+11:00,ERROR: Exception caught in WorkDoerFactory. Cancelling all jobs. Exception:
    Cannot calculate median of an empty SortedList.
System.Exception: Cannot calculate median of an empty SortedList.
   at Illumina.Common.SortedListExtensions.Median[T](SortedList`1 list, Func`3 average)
   at CanvasPedigreeCaller.SampleMetrics.GetSampleInfo(IReadOnlyList`1 segments, String ploidyBedPath, Int32 numberOfTrimmedBins, SampleId id) in D:\TeamCity\buildAgent\work\a29a190a11771d97\Src\Canvas\CanvasPedigreeCaller\SampleMetrics.cs:line 38
   at CanvasPedigreeCaller.CanvasPedigreeCaller.CallVariants(List`1 variantFrequencyFiles, List`1 segmentFiles, IFileLocation outVcfFile, String ploidyBedPath, String referenceFolder, List`1 sampleNames, String commonCnvsBedPath, List`1 sampleTypes) in D:\TeamCity\buildAgent\work\a29a190a11771d97\Src\Canvas\CanvasPedigreeCaller\CanvasPedigreeCaller.cs:line 93

Am I missing something simple?

eroller commented 5 years ago

make sure the sample IDs in the --sample-b-allele-vcf file match the SM tags in the bam header. If not you will need to specify the correct sampleIDs for each bam file on the command line.

osowiecki commented 5 years ago

I have another question. According to : https://github.com/Illumina/canvas/wiki comparing family samples should produce DQ score. In my comparison all DQ score fields are ".". Can you explain to me if this is correct behavior of the application? When is DQ score calculated in Small Pedigree Workflow?

Canvas SmallPedigree-WGS --bam=./bam_fam/CF_6924.bam father CF_6924 --bam=./bam_fam/CF_6925.bam mother CF_6925 --bam=./bam_fam/CF_6916.bam proband CF_6916 --sample-b-allele-vcf=./temp2.vcf -o ./CNV/FAMILY -r ./data/kmers.fasta -g ./data/canFam3/ --filter-bed=./data/filter.bed --ploidy-vcf=./data/ploidy.vcf

example :

9 19897945 Canvas:REF:9:19897945-31453316 N . . PASS END=31453316;CIPOS=-609,609;CIEND=-537,549 GT:RC:BC:CN:MCC:MCCQ:QS:FT:DQ ./.:105.97:9649:2:.:.:16.87:PASS:. ./.:104.99:9649:2:.:.:16.73:PASS:. ./.:102.00:9649:2:.:.:16.43:PASS:.

eroller commented 5 years ago

DQ is calculated when there is a conflicting set of copy number genotypes in the trio (Mother/Father/Child). In the example you should each sample has the reference copy number of 2 so there is no conflict. An example of a conflict would be if there was a deletion in the child (e.g. CN=1), but reference copy number in each parent (i.e. CN=2).