cortes-ciriano-lab / savana

Somatic structural variant caller for long-read data
Apache License 2.0
43 stars 2 forks source link

Savana performance on the Valle-Inclan benchmark #30

Closed waltergallegog closed 4 weeks ago

waltergallegog commented 12 months ago

Hello, I was wondering if you have evaluated / trained SAVANA on the Valle-Inclan benchmark. https://www.sciencedirect.com/science/article/pii/S2666979X22000726

From my side, I have checked the deletions, and it seems SAVANA is losing 6 of them from the truth set, in the classification step, when using the default classifier (classified.sv_breakpoints.somatic.vcf).

Instead if I use aligned.sv_breakpoints.vcf and filter it, I get the deletions. 4 of the deletions have good support for TUMOR and 0 for NORMAL.
Maybe I'm missing some configuration to make the default classifier work better?

Here is a summary of the six deletions:

CHROM POS ID SVTYPE SUPP SEQ SUPP VAL SVLEN Support SAVANA TUMOR Support SAVANA NORMAL
5 178,973,288 truthset_17_1 DEL ILL,ONT,PB CAPTURE 33 32 0
9 114,293,052 truthset_37_1 DEL ILL,ONT NOT_VALIDATED 45 27 0
X 34,059,778 truthset_68_1 DEL ILL,ONT,PB,10X PCR,CAPTURE,BIONANO 2,927 35 0
12 92,161,147 truthset_47_1 DEL ILL,ONT PCR,CAPTURE 35 15 0
8 112,062,420 truthset_32_1 DEL ILL PCR,CAPTURE 1,163 4 0
10 55,476,347 truthset_42_1 DEL ILL,ONT CAPTURE 549 6 0

These are the deletions from the aligned.sv_breakpoints.vcf file:

5   178973287   ID_84916_1  C   C[5:178973320[  .   PASS    SVTYPE=BND;MATEID=ID_84916_2;NORMAL_SUPPORT=0;TUMOUR_SUPPORT=32;SVLEN=33;BP_NOTATION=+-;ORIGINATING_CLUSTER=5a77aa60f53e462491aa1b147e83b7cc;END_CLUSTER=e1e684cc1561488ea837d212d949ddfe;ORIGIN_STARTS_STD_DEV=2.7;ORIGIN_MAPQ_MEAN=58.78;ORIGIN_EVENT_SIZE_STD_DEV=1.61;ORIGIN_EVENT_SIZE_MEDIAN=33;ORIGIN_EVENT_SIZE_MEAN=33.31;END_STARTS_STD_DEV=2.31;END_MAPQ_MEAN=58.78;END_EVENT_SIZE_STD_DEV=1.61;END_EVENT_SIZE_MEDIAN=33;END_EVENT_SIZE_MEAN=33.31;TUMOUR_DP=39,39;NORMAL_DP=46,46   GT  0/1
5   178973320   ID_84916_2  G   ]5:178973287]G  .   PASS    SVTYPE=BND;MATEID=ID_84916_1;NORMAL_SUPPORT=0;TUMOUR_SUPPORT=32;SVLEN=33;BP_NOTATION=+-;ORIGINATING_CLUSTER=5a77aa60f53e462491aa1b147e83b7cc;END_CLUSTER=e1e684cc1561488ea837d212d949ddfe;ORIGIN_STARTS_STD_DEV=2.7;ORIGIN_MAPQ_MEAN=58.78;ORIGIN_EVENT_SIZE_STD_DEV=1.61;ORIGIN_EVENT_SIZE_MEDIAN=33;ORIGIN_EVENT_SIZE_MEAN=33.31;END_STARTS_STD_DEV=2.31;END_MAPQ_MEAN=58.78;END_EVENT_SIZE_STD_DEV=1.61;END_EVENT_SIZE_MEDIAN=33;END_EVENT_SIZE_MEAN=33.31;TUMOUR_DP=39,39;NORMAL_DP=46,46   GT  0/1

9   114293052   ID_47849_1  T   T[9:114293096[  .   PASS    SVTYPE=BND;MATEID=ID_47849_2;NORMAL_SUPPORT=0;TUMOUR_SUPPORT=27;SVLEN=44;BP_NOTATION=+-;ORIGINATING_CLUSTER=c4017e2cf7414f04976885db15b8dc51;END_CLUSTER=11ed1bf5268f4794b5d1baff2cf8ee9d;ORIGIN_STARTS_STD_DEV=5.36;ORIGIN_MAPQ_MEAN=60;ORIGIN_EVENT_SIZE_STD_DEV=2.47;ORIGIN_EVENT_SIZE_MEDIAN=44;ORIGIN_EVENT_SIZE_MEAN=43.96;END_STARTS_STD_DEV=4.74;END_MAPQ_MEAN=60;END_EVENT_SIZE_STD_DEV=2.47;END_EVENT_SIZE_MEDIAN=44;END_EVENT_SIZE_MEAN=43.96;TUMOUR_DP=67,69;NORMAL_DP=59,59    GT  0/1
9   114293096   ID_47849_2  A   ]9:114293052]A  .   PASS    SVTYPE=BND;MATEID=ID_47849_1;NORMAL_SUPPORT=0;TUMOUR_SUPPORT=27;SVLEN=44;BP_NOTATION=+-;ORIGINATING_CLUSTER=c4017e2cf7414f04976885db15b8dc51;END_CLUSTER=11ed1bf5268f4794b5d1baff2cf8ee9d;ORIGIN_STARTS_STD_DEV=5.36;ORIGIN_MAPQ_MEAN=60;ORIGIN_EVENT_SIZE_STD_DEV=2.47;ORIGIN_EVENT_SIZE_MEDIAN=44;ORIGIN_EVENT_SIZE_MEAN=43.96;END_STARTS_STD_DEV=4.74;END_MAPQ_MEAN=60;END_EVENT_SIZE_STD_DEV=2.47;END_EVENT_SIZE_MEDIAN=44;END_EVENT_SIZE_MEAN=43.96;TUMOUR_DP=69,67;NORMAL_DP=59,59    GT  0/1

X   34059777    ID_34054_1  G   G[X:34062694[   .   PASS    SVTYPE=BND;MATEID=ID_34054_2;NORMAL_SUPPORT=0;TUMOUR_SUPPORT=35;SVLEN=2917;BP_NOTATION=+-;ORIGINATING_CLUSTER=9e84dbab9cb543bdb539f8897b417d79;END_CLUSTER=cf55ed475d1847d3996c04cfb3b21df5;ORIGIN_STARTS_STD_DEV=10.31;ORIGIN_MAPQ_MEAN=60;ORIGIN_EVENT_SIZE_STD_DEV=13.62;ORIGIN_EVENT_SIZE_MEDIAN=2913;ORIGIN_EVENT_SIZE_MEAN=2919.91;END_STARTS_STD_DEV=7.75;END_MAPQ_MEAN=60;END_EVENT_SIZE_STD_DEV=13.62;END_EVENT_SIZE_MEDIAN=2913;END_EVENT_SIZE_MEAN=2919.91;TUMOUR_DP=34,32;NORMAL_DP=20,22   GT  0/1
X   34062694    ID_34054_2  A   ]X:34059777]A   .   PASS    SVTYPE=BND;MATEID=ID_34054_1;NORMAL_SUPPORT=0;TUMOUR_SUPPORT=35;SVLEN=2917;BP_NOTATION=+-;ORIGINATING_CLUSTER=9e84dbab9cb543bdb539f8897b417d79;END_CLUSTER=cf55ed475d1847d3996c04cfb3b21df5;ORIGIN_STARTS_STD_DEV=10.31;ORIGIN_MAPQ_MEAN=60;ORIGIN_EVENT_SIZE_STD_DEV=13.62;ORIGIN_EVENT_SIZE_MEDIAN=2913;ORIGIN_EVENT_SIZE_MEAN=2919.91;END_STARTS_STD_DEV=7.75;END_MAPQ_MEAN=60;END_EVENT_SIZE_STD_DEV=13.62;END_EVENT_SIZE_MEDIAN=2913;END_EVENT_SIZE_MEAN=2919.91;TUMOUR_DP=32,34;NORMAL_DP=22,20   GT  0/1

12  92161146    ID_42426_1  C   C[12:92161181[  .   PASS    SVTYPE=BND;MATEID=ID_42426_2;NORMAL_SUPPORT=0;TUMOUR_SUPPORT=15;SVLEN=35;BP_NOTATION=+-;ORIGINATING_CLUSTER=c04b2b7ee3ae4be8896ca72a1ab8735b;END_CLUSTER=c4b995f7fc7b4908b443696f492aaf33;ORIGIN_STARTS_STD_DEV=1.59;ORIGIN_MAPQ_MEAN=60;ORIGIN_EVENT_SIZE_STD_DEV=1.41;ORIGIN_EVENT_SIZE_MEDIAN=35;ORIGIN_EVENT_SIZE_MEAN=34.13;END_STARTS_STD_DEV=0.62;END_MAPQ_MEAN=60;END_EVENT_SIZE_STD_DEV=1.41;END_EVENT_SIZE_MEDIAN=35;END_EVENT_SIZE_MEAN=34.13;TUMOUR_DP=66,66;NORMAL_DP=51,51    GT  0/1
12  92161181    ID_42426_2  G   ]12:92161146]G  .   PASS    SVTYPE=BND;MATEID=ID_42426_1;NORMAL_SUPPORT=0;TUMOUR_SUPPORT=15;SVLEN=35;BP_NOTATION=+-;ORIGINATING_CLUSTER=c04b2b7ee3ae4be8896ca72a1ab8735b;END_CLUSTER=c4b995f7fc7b4908b443696f492aaf33;ORIGIN_STARTS_STD_DEV=1.59;ORIGIN_MAPQ_MEAN=60;ORIGIN_EVENT_SIZE_STD_DEV=1.41;ORIGIN_EVENT_SIZE_MEDIAN=35;ORIGIN_EVENT_SIZE_MEAN=34.13;END_STARTS_STD_DEV=0.62;END_MAPQ_MEAN=60;END_EVENT_SIZE_STD_DEV=1.41;END_EVENT_SIZE_MEDIAN=35;END_EVENT_SIZE_MEAN=34.13;TUMOUR_DP=66,66;NORMAL_DP=51,51    GT  0/1

8   112062419   ID_53869_1  A   A[8:112063581[  .   PASS    SVTYPE=BND;MATEID=ID_53869_2;NORMAL_SUPPORT=0;TUMOUR_SUPPORT=4;SVLEN=1162;BP_NOTATION=+-;ORIGINATING_CLUSTER=1e4622d7af424aa485a2de7572c26ba9;END_CLUSTER=9b7509c9d2404c2eb966e29293c6189c;ORIGIN_STARTS_STD_DEV=0;ORIGIN_MAPQ_MEAN=60;ORIGIN_EVENT_SIZE_STD_DEV=0;ORIGIN_EVENT_SIZE_MEDIAN=1162;ORIGIN_EVENT_SIZE_MEAN=1162;END_STARTS_STD_DEV=0;END_MAPQ_MEAN=60;END_EVENT_SIZE_STD_DEV=0;END_EVENT_SIZE_MEDIAN=1162;END_EVENT_SIZE_MEAN=1162;TUMOUR_DP=70,65;NORMAL_DP=43,45 GT  0/1
8   112063581   ID_53869_2  T   ]8:112062419]T  .   PASS    SVTYPE=BND;MATEID=ID_53869_1;NORMAL_SUPPORT=0;TUMOUR_SUPPORT=4;SVLEN=1162;BP_NOTATION=+-;ORIGINATING_CLUSTER=1e4622d7af424aa485a2de7572c26ba9;END_CLUSTER=9b7509c9d2404c2eb966e29293c6189c;ORIGIN_STARTS_STD_DEV=0;ORIGIN_MAPQ_MEAN=60;ORIGIN_EVENT_SIZE_STD_DEV=0;ORIGIN_EVENT_SIZE_MEDIAN=1162;ORIGIN_EVENT_SIZE_MEAN=1162;END_STARTS_STD_DEV=0;END_MAPQ_MEAN=60;END_EVENT_SIZE_STD_DEV=0;END_EVENT_SIZE_MEDIAN=1162;END_EVENT_SIZE_MEAN=1162;TUMOUR_DP=65,70;NORMAL_DP=45,43 GT  0/1

10  55476345    ID_77183_1  T   T[10:55476895[  .   PASS    SVTYPE=BND;MATEID=ID_77183_2;NORMAL_SUPPORT=0;TUMOUR_SUPPORT=6;SVLEN=550;BP_NOTATION=+-;ORIGINATING_CLUSTER=4a1c9360eefb420d8601dc2b2420ff25;END_CLUSTER=25f8c40247a04d5a903335e53dd8a590;ORIGIN_STARTS_STD_DEV=4.31;ORIGIN_MAPQ_MEAN=60;ORIGIN_EVENT_SIZE_STD_DEV=2.73;ORIGIN_EVENT_SIZE_MEDIAN=549.5;ORIGIN_EVENT_SIZE_MEAN=548.83;END_STARTS_STD_DEV=2.06;END_MAPQ_MEAN=60;END_EVENT_SIZE_STD_DEV=2.73;END_EVENT_SIZE_MEDIAN=549.5;END_EVENT_SIZE_MEAN=548.83;TUMOUR_DP=41,38;NORMAL_DP=47,47    GT  0/1
10  55476895    ID_77183_2  T   ]10:55476345]T  .   PASS    SVTYPE=BND;MATEID=ID_77183_1;NORMAL_SUPPORT=0;TUMOUR_SUPPORT=6;SVLEN=550;BP_NOTATION=+-;ORIGINATING_CLUSTER=4a1c9360eefb420d8601dc2b2420ff25;END_CLUSTER=25f8c40247a04d5a903335e53dd8a590;ORIGIN_STARTS_STD_DEV=4.31;ORIGIN_MAPQ_MEAN=60;ORIGIN_EVENT_SIZE_STD_DEV=2.73;ORIGIN_EVENT_SIZE_MEDIAN=549.5;ORIGIN_EVENT_SIZE_MEAN=548.83;END_STARTS_STD_DEV=2.06;END_MAPQ_MEAN=60;END_EVENT_SIZE_STD_DEV=2.73;END_EVENT_SIZE_MEDIAN=549.5;END_EVENT_SIZE_MEAN=548.83;TUMOUR_DP=38,41;NORMAL_DP=47,47    GT  0/1

Let me know if I some of the savana output files may be useful.

BR Walter

helrick commented 4 weeks ago

Hi there,

Thanks for your detailed issue and apologies on my late response! We've updated SAVANA in the latest release to improve our performance and also reported our results on the Valle-Inclan benchmark in our recent preprint. I hope that answers your question - happy to discuss further if you are still seeing variants which aren't reported despite high support.

Many thanks, Hillary