Illumina / canvas

Canvas - Copy number variant (CNV) calling from DNA sequencing data
Other
121 stars 20 forks source link

Demo evaluation needs correction #118

Open logust79 opened 5 years ago

logust79 commented 5 years ago

In the demo's Evaluation section, the command:

zcat /tmp/gHapMixDemo/TempCNV_child1/CNV.vcf.gz | grep -v ":REF:" > /tmp/gHapMixDemo/TempCNV_child1/CNV.vcf (remove REF calls)
/CanvasDIR/Tools/EvaluateCNV/EvaluateCNV.dll /ihart/BaseSpace/Projects/CanvasSPW/AppResults/simdata/Files/child1_truth.bed /tmp/gHapMixDemo/TempCNV_child1/CNV.vcf /CanvasDIR/Tools/EvaluateCNV/generic.cnaqc.excluded_regions.bed inheritedCNVs.txt 

would not run since the path to generic.cnaqc.excluded_regions.bed is wrong, and also for consistency, CanvasSPW should be renamed to canvas. And it's better to comment out the (remove REF calls) part. So in the end it would be something like this:

zcat /tmp/gHapMixDemo/TempCNV_child1/CNV.vcf.gz | grep -v ":REF:" > /tmp/gHapMixDemo/TempCNV_child1/CNV.vcf #(remove REF calls)
/CanvasDIR/Tools/EvaluateCNV/EvaluateCNV.dll /tmp/BaseSpace/Projects/canvas/AppResults/simdata/Files/child1_truth.bed /tmp/gHapMixDemo/TempCNV_child1/CNV.vcf /tmp/BaseSpace/Projects/canvas/AppResults/simdata/Files/generic.cnaqc.excluded_regions.bed inheritedCNVs.txt 

But in the end it still crashes saying that I need to provide reference ploidy

...
2019-04-23T09:47:57+01:00,ERROR: Exception caught in WorkDoerFactory. Cancelling all jobs. Exception:
    Error: Truth variant chr6:105256020-105271607 with no overlapping Canvas calls. Reference ploidy cannot be determined! Please provide reference ploidy via command line options
...
eroller commented 5 years ago

Yes, the demo documentation is outdated. Sorry about that. I will keep this issue open so others can see the workaround. For reference ploidy vcf input see this post: https://github.com/Illumina/canvas/issues/89#issuecomment-400762109

logust79 commented 5 years ago

Thank you for your reply. After some research and trials / errors, I still fail. This is the code I ran:

zcat output/demo/TempCNV_child1/CNV.vcf.gz | grep -v ":REF:" > output/demo/TempCNV_child1/CNV.vcf #(remove REF calls)
dotnet /canvasdir/Tools/EvaluateCNV/EvaluateCNV.dll \
    /tmp/BaseSpace/Projects/canvas/AppResults/simdata/Files/child1_truth.bed \
    output/demo/TempCNV_child1/CNV.vcf \
    /tmp/BaseSpace/Projects/canvas/AppResults/simdata/Files/generic.cnaqc.excluded_regions.bed \
    inheritedCNVs.txt \
    --ploidy=1 1 data/Files/par.bed

par.bed being

chrX    60001   2699520
chrX    154931044   155260560
chrY    10001   2649520
chrY    59034050    59363566

Error being

2019-04-24T12:20:45+01:00,ERROR: Exception caught in WorkDoerFactory. Cancelling all jobs. Exception:
        Value cannot be null.
Parameter name: fileName
System.ArgumentNullException: Value cannot be null.
Parameter name: fileName
   at System.IO.FileInfo..ctor(String originalPath, String fullPath, String fileName, Boolean isNormalized)
   at EvaluateCNV.CNVChecker.ComputeCallability(ILogger logger, Dictionary`2 callsByContig, EvaluateCnvOptions options, IDirectoryLocation output) in D:\TeamCity\buildAgent\work\a2$a190a11771d97\Tools\EvaluateCNV\CNVChecker.cs:line 543
   at EvaluateCNV.CNVChecker.<>c__DisplayClass24_0.<Evaluate>b__4(IWorkDoer workDoer) in D:\TeamCity\buildAgent\work\a29a190a11771d97\Tools\EvaluateCNV\CNVChecker.cs:line 536
   at Isas.Framework.WorkManagement.JobLaunching.JobLauncherFactory.RunWithJobLauncher(ILogger logger, ISettings settings, IDirectoryLocation loggingDir, Action`1 logCommand, Cance$lationToken cancellationToken, Action`1 function)
   at Isas.Framework.WorkManagement.JobLaunching.JobLauncherFactory.RunWithJobLauncher(ILogger logger, ISettings settings, IDirectoryLocation analysisFolder, CancellationToken canc$llationToken, Action`1 function)
   at Isas.Framework.WorkManagement.ResourceManagement.WorkResourceManagerFactory.RunWithResourceManager(ILogger logger, ISettings settings, CancellationToken cancellationToken, Ac$ion`1 function)
   at Isas.Framework.WorkManagement.WorkDoerFactory.RunWithWorkDoer(ILogger logger, ISettings settings, IDirectoryLocation analysisFolder, CancellationTokenSource cancellationToken$ource, Action`1 function)
   at EvaluateCNV.CNVChecker.Evaluate(String truthSetPath, String cnvCallsPath, String excludedBed, String outputPath, EvaluateCnvOptions options) in D:\TeamCity\buildAgent\work\a29
a190a11771d97\Tools\EvaluateCNV\CNVChecker.cs:line 538
   at EvaluateCNV.Program.MainHelper(String[] args) in D:\TeamCity\buildAgent\work\a29a190a11771d97\Tools\EvaluateCNV\Program.cs:line 49
   at EvaluateCNV.Program.Main(String[] args) in D:\TeamCity\buildAgent\work\a29a190a11771d97\Tools\EvaluateCNV\Program.cs:line 16

Any idea?

logust79 commented 5 years ago

I figured out that I needed to provide kmer.fa. And since it infers the (wrong) location of GenomeSize.xml, I needed to soft link some of the files such as kmer.fa and filter13.bed.

zcat output/demo/TempCNV_child1/CNV.vcf.gz | grep -v ":REF:" > output/demo/TempCNV_child1/CNV.vcf #(remove REF calls)
dotnet /canvasdir/Tools/EvaluateCNV/EvaluateCNV.dll \
    /tmp/BaseSpace/Projects/canvas/AppResults/simdata/Files/child1_truth.bed \
    output/demo/TempCNV_child1/CNV.vcf \
    /tmp/BaseSpace/Projects/canvas/AppResults/simdata/Files/generic.cnaqc.excluded_regions.bed \
    inheritedCNVs.txt \
    --ploidy=1 1 data/Files/ploidy.bed \
    -k=data/canvasdata/Files/kmer.fa

This command works with no errors, and outputs the following as part of the result:

Ploidy  1.86
Results for PASSing variants
Accuracy        39.7608
DirectionAccuracy       40.1665
F-score 0.8575
Recall  77.7004
DirectionRecall 78.4933
Precision       95.6493
DirectionPrecision      96.6254
GainRecall      70.6110
GainDirectionRecall     71.4076
GainPrecision   91.2464
GainDirectionPrecision  92.2757
LossRecall      80.0021
LossDirectionRecall     80.0021
LossPrecision   96.9904
LossDirectionPrecision  97.9502
MeanEventAccuracy       68.7341
MedianEventAccuracy     94.5666
VariantEventsCalled     2133
VariantBasesCalled      219903552
...

The recall rate is a bit far off from the documentation, though there are warnings in the stderr that might be related, such as that it failed to locate PARv5.bed, and one of the chrY calls has GT as 1/1:... instead of 1:.... Any ideas?

eroller commented 5 years ago

There are no truth events on chrX for that sample so the PAR calls will not affect recall. The lower recall number you are seeing is probably just a limitation in the truth set for that simulated dataset. ~80% recall is typical for a germline sample.

PARv5.bed files attached PARv5.bed.hg19.txt PARv5.bed.grch38.txt PARv5.bed.grch37.txt

logust79 commented 5 years ago

Thank you @eroller ! I guess the demo run can be deemed a success.