cortes-ciriano-lab / savana

Somatic structural variant caller for long-read data
Apache License 2.0
43 stars 2 forks source link

Request for additional parameters for a higher precision on somatic variant callsets #34

Closed andyjslee closed 1 month ago

andyjslee commented 10 months ago

There has been a need to identify case-specific variants especially in the context of tumor/normal pairs so I'm glad to see a tool filling in this need. Thank you!

I want to highlight something that I came across while testing your tool. I ran savana on hg002 and hg001 pacbio whole-genome sequencing data with hg002 labeled as tumor and hg001 labeled as normal for testing purposes (minimum read support = 3 and minimum mapping quality = 20). I found a case that might help savana potentially improve its somatic variant calling performance. Savana reports no read support for chr16:46390833 deletion in hg001 even though there is one read supporting the deletion (see savana vcf row below). This read also has a mapping quality of 60. I suspect the performance of the tool will depend on the coverage of the control sample.

chr16 46390833 ID_384_1 c c[chr16:46390991[ . PASS SVTYPE=BND;MATEID=ID_384_2;NORMAL_SUPPORT=0;TUMOUR_SUPPORT=29;SVLEN=158;BP_NOTATION=+-;ORIGINATING_CLUSTER=a3e06949c29c44b297520e8f76cc7a0a;END_CLUSTER=97a07de2c66d4bacbafe8f7188e15a2b;ORIGIN_STARTS_STD_DEV=8.94;ORIGIN_MAPQ_MEAN=52.9;ORIGIN_EVENT_SIZE_STD_DEV=8.95;ORIGIN_EVENT_SIZE_MEDIAN=158;ORIGIN_EVENT_SIZE_MEAN=159.66;END_STARTS_STD_DEV=0.36;END_MAPQ_MEAN=52.9;END_EVENT_SIZE_STD_DEV=8.95;END_EVENT_SIZE_MEDIAN=158;END_EVENT_SIZE_MEAN=159.66;TUMOUR_DP=71,72;NORMAL_DP=2,1 GT 0/1

Screenshot 2023-10-26 at 12 07 11 PM

For a more stringent somatic variant identification, would it be possible for savana to take another set of parameters for normal samples? For example, I want to supply a less stringent number of supporting reads (e.g. 1) and a lower mapping quality (e.g. 5) for normal samples to increase sensitivity on variant detection for normal samples, coupled with a more stringent number of supporting reads (e.g. 7) and a higher mapping quality (e.g. 20) for tumor samples. I think this will help to improve the precision of your somatic variant detection. Thanks!

yashcrux commented 9 months ago

I have a Similar issue that I am facing,. Have you had a look into the "Classify by Parameter File" section in the Readme.md Maybe you can create a .json file and then use that.

helrick commented 1 month ago

Hi there, apologies for the delay in getting back to you.

Yes, as @yashcrux mentions, I would recommend using the Classify by Parameter File section in the README to set custom thresholds which work best for your use case.