Illumina / canvas

Canvas - Copy number variant (CNV) calling from DNA sequencing data
Other
121 stars 20 forks source link

Using precomputed .binned files #96

Open sternp opened 6 years ago

sternp commented 6 years ago

Dear Eric et al,

I have been trying to produce a precomputed Canvas .binned file from a panel of normals for the purpose of calling germline CNVs in future. Whilst I have been able to generate the .binned file, Canvas has been unable to utilise this file in the three versions of Canvas I have tried (1.11, 1.29 and 1.38). I have managed to circumvent issues described by other users, however I have uncovered some new issues. My procedure is as follows:

  1. Generate a precomputed .binned file (using Canvas 1.11. I get index out of range errors for 1.29 and 1.38)

    mono /Canvas-1.11/Canvas.exe Somatic-Enrichment
    --bam bamfile.bam
    --control-bam controlbam1.bam
    --control-bam controlbam2.bam
    --reference kmer.fa
    --manifest grch37.txt
    -g Illumina_grch37_files/WholeGenomeFASTA
    -n output
    -f filter13.bed
    -o output
    --b-allele-vcf dbsnp.vcf
    --exclude-non-het-b-allele-sites
    --custom-parameters=CanvasSomaticCaller,--definedpurity=1
  2. Use the generated .normal.binned file and the output binsize and attempt to call germline CNVs. (Using Canvas 1.11).

    mono /Canvas-1.11/Canvas.exe Somatic-Enrichment 
    --bam bamfile.bam \
    --control-binned output.normal.binned
    --control-bin-size 713 
    --reference kmer.fa 
    --manifest grch37.txt 
    -g Illumina_grch37_files/WholeGenomeFASTA
    -n output2 
    -f filter13.bed
    -o output2
    --b-allele-vcf dbsnp.vcf
    --exclude-non-het-b-allele-sites
    --custom-parameters=CanvasSomaticCaller,--definedpurity=1
    --custom-parameters=CanvasSNV,--isDbSnpVcf=True

However it seems that the --control-bin-size parameter cannot be parsed failed to convert 713 to System.Nullable 1[System.UInt32]

  1. Because Step 2 failed, I attempted it with later versions of Canvas.
    dotnet /Canvas-1.38/Canvas.dll Somatic-Enrichment
    --bam bamfile.bam
    --control-binned output.normal.binned
    --control-bin-size 713 
    --reference kmer.fa 
    --manifest grch37.txt 
    -g Illumina_grch37_files/WholeGenomeFASTA
    -n output3
    -f filter13.bed 
    -o output3 
    --ploidy-vcf male_ploidy.vcf 
    --population-b-allele-vcf dbsnp.vcf 
    --custom-parameters=CanvasSomaticCaller,--definedpurity=1
    --custom-parameters=CanvasSNV,--isDbSnpVcf=True

    However Canvas fails. Note: Evenness metric file not found at /output3/TempCNV_output3/EvennessMetric.txt. This does not seem be generated by the program.

2018-09-13T16:02:58+10:00,Running checkpoint 07: Variant calling
2018-09-13T16:02:58+10:00,Checkpoint 06 Intersect bins with manifest complete. Elapsed time (hh/mm/ss): 00:00:00.1
2018-09-13T16:02:58+10:00,Note: Evenness metric file not found at '/cnv/Germline_PON/output3/TempCNV_output3/EvennessMetric.txt'
2018-09-13T16:02:58+10:00,Launching process for job SomaticCNV-output3:
/usr/bin/dotnet /cnv/Canvas-1.38.0.1554/CanvasSomaticCaller/CanvasSomaticCaller.dll  -v /cnv/Germline_PON/output3/TempCNV_output3/VFResultsoutput3.txt.gz -i /cnv/Germline_PON/output3/TempCNV_output3/output3.partitioned -o /cnv/Germline_PON/output3/CNV.vcf.gz -b /ref_seq/Illumina_grch37_files/filter13.bed -p /cnv/Germline_PON/male_ploidy.vcf -n output3 -e -d --local-sd-metric-file "/cnv/Germline_PON/output3/TempCNV_output3/LocalSdMetric.txt" -r "/ref_seq/Illumina_grch37_files/WholeGenomeFASTA" --definedpurity=1
2018-09-13T16:03:00+10:00,Job SomaticCNV-output3 duration: 00:00:01.4
2018-09-13T16:03:00+10:00,ERROR: Job SomaticCNV-output3 failed with exit code 134. Job logs: 
    /cnv/Germline_PON/output3/Logging/SomaticCNV-output3.stdout
    /cnv/Germline_PON/output3/Logging/SomaticCNV-output3.stderr

If you could help me get this working that would be very much appreciated!

eroller commented 5 years ago

Hi, I'm sorry you are having to jump through so many hoops with the precomputed bin option. I'm hesitant to recommend trying to get it to work with 1.38 since we have not evaluated how recent changes for WGS may have impacted calling on enrichment data in this newer version. If you want to try, you can use the placeholder files attached to get around that error in 1.38. Adding the files should not impact CNV calling as they are just used for adding information to the header of the output VCF EvennessMetric.txt LocalSdMetric.txt

The error you are seeing in 1.11 just looks like a bug to me. The option for precomputed bins has not been extensively tested and unfortunately the somatic enrichment workflow is not under active development so this bug will likely not get fixed.

Please note that active CNV development has transitioned to DRAGEN and you can expect CNV analysis improvements in future releases of the DRAGEN BaseSpace app.

sternp commented 5 years ago

Thanks Eric. Is there a custom parameter in the command line to input an alternative EvennessMetric.txt? I tried putting that file in the TempDirectory and still getting some errors.

eroller commented 5 years ago

Oh, the failure was not because of the missing file. You will need to look into the log files listed:

2018-09-13T16:03:00+10:00,ERROR: Job SomaticCNV-output3 failed with exit code 134. Job logs: /cnv/Germline_PON/output3/Logging/SomaticCNV-output3.stdout /cnv/Germline_PON/output3/Logging/SomaticCNV-output3.stderr

I suspect the error will not be one we can fix given that the Somatic-Enrichment workflow has not been tested in versions after 1.11.0. It looks like precomputed bins may not be workable.

sternp commented 5 years ago

Thanks Eric.

On a slightly related topic - is there any recommendations of how many control bams should be used in Somatic Enrichment mode?

eroller commented 5 years ago

One to a few control bams should be sufficient. I believe the normalization technique does not take advantage of the distribution in coverage from the control bams so it is diminishing returns in performance with additional control bams.