cortes-ciriano-lab / savana

Somatic structural variant caller for long-read data
Apache License 2.0
43 stars 2 forks source link

No records reported with tumor support lower than 6 #17

Closed waltergallegog closed 4 weeks ago

waltergallegog commented 1 year ago

Hello, I have invoked savana using the depth parameter with value 2. This is the full command:

 savana --tumour tumor.bam --normal normal.bam --ref ref.fa --outdir out/ --length 50 --depth 2 --threads 16 --debug >out.info 2>out.err

I'm using as savana output the file somatic.sv_breakpoints.lenient.vcf

Here are my observations after running Savana and nanomonsv on a tumor/normal pair:

This makes me think that perhaps the --depth parameter is not working correctly, and is set to 6. Let me know what other files of the Savana output I can check to understand what could explain this behavior.

Thanks and Best regards.

image

helrick commented 1 year ago

Thanks for raising this issue. Yes, you are correct in that the depth value is not being used correctly by SAVANA. This value is actually used internally by the algorithm and doesn't make sense as a parameter. I've removed it in the working branch for the next release of SAVANA.

I'm currently working on improving variant filtering for SAVANA and hope to have a new release shortly which will allow users to set custom filters on fields such as support.

In the meantime, does this variant appear in the {sample}.sv_breakpoints.vcf file? This VCF contains all variants before filtering and can be custom filtered for depth using the TUMOUR_SUPPORT and NORMAL_SUPPORT fields in the INFO field.

waltergallegog commented 1 year ago

Thanks for the update. The breakpoints are indeed in the indicated file, with the same start, end and len as indicated by nanomonsv. The TUMOUR_SUPPORT and NORMAL_SUPPORT are 5 and 0 as expected.

chr6    125976001   ID_63995_1  C   C[chr6:125976080[   .   PASS    NORMAL_SUPPORT=0;TUMOUR_SUPPORT=5;SVLEN=79;BP_NOTATION=+-;ORIGINATING_CLUSTER=fa018ced08544c6191c4d19a1b50edc5;END_CLUSTER=46ab8238044945088d8240d4b1194301;ORIGIN_STARTS_STD_DEV=1.1;ORIGIN_STARTS_MEDIAN=1.25976e+08;ORIGIN_EVENT_SIZE_STD_DEV=1.17;ORIGIN_EVENT_SIZE_MEDIAN=80;ORIGIN_EVENT_SIZE_MEAN=79.2;ORIGIN_UNCERTAINTY=4.54;ORIGIN_EVENT_HEURISTIC=0.01;END_STARTS_STD_DEV=0.4;END_STARTS_MEDIAN=1.25976e+08;END_EVENT_SIZE_STD_DEV=1.17;END_EVENT_SIZE_MEDIAN=80;END_EVENT_SIZE_MEAN=79.2;END_UNCERTAINTY=3.03;END_EVENT_HEURISTIC=0.01  GT  0/1
chr6    125976080   ID_63995_2  A   ]chr6:125976001]A   .   PASS    NORMAL_SUPPORT=0;TUMOUR_SUPPORT=5;SVLEN=79;BP_NOTATION=+-;ORIGINATING_CLUSTER=fa018ced08544c6191c4d19a1b50edc5;END_CLUSTER=46ab8238044945088d8240d4b1194301;ORIGIN_STARTS_STD_DEV=1.1;ORIGIN_STARTS_MEDIAN=1.25976e+08;ORIGIN_EVENT_SIZE_STD_DEV=1.17;ORIGIN_EVENT_SIZE_MEDIAN=80;ORIGIN_EVENT_SIZE_MEAN=79.2;ORIGIN_UNCERTAINTY=4.54;ORIGIN_EVENT_HEURISTIC=0.01;END_STARTS_STD_DEV=0.4;END_STARTS_MEDIAN=1.25976e+08;END_EVENT_SIZE_STD_DEV=1.17;END_EVENT_SIZE_MEDIAN=80;END_EVENT_SIZE_MEAN=79.2;END_UNCERTAINTY=3.03;END_EVENT_HEURISTIC=0.01  GT  0/1
waltergallegog commented 1 year ago

In the meantime I would like to do some custom filtering using the bcftools view -i command. Just to confirm, the current filtering for the lenient version is done by the following function correct?

def apply_somatic_filters(validated_breakpoints):
    """ use heuristics to eliminate noise and germline variants """
    somatic_breakpoints_lenient = []
    for bp in validated_breakpoints:
        if bp.support['tumour'] == 0:
            continue
        elif bp.support['normal']/bp.support['tumour'] < 0.1:
            originating_cluster_stats = bp.originating_cluster.get_stats()
            end_cluster_stats = bp.end_cluster.get_stats()
            if originating_cluster_stats['starts_std_dev'] < 150 and originating_cluster_stats['event_heuristic'] < 3:
                if bp.breakpoint_notation == "<INS>" and bp.support['tumour'] > 25:
                    somatic_breakpoints_lenient.append(bp)
                elif bp.breakpoint_notation != "<INS>" and bp.support['tumour'] > 5:
                    somatic_breakpoints_lenient.append(bp)

If that is the correct function, I have prepared the following bcftools command that seems to produce the same results, and that I will customize to my needs.

bcftools view -i 
'
TUMOUR_SUPPORT>0 
& (NORMAL_SUPPORT/TUMOUR_SUPPORT)<0.099 
& ORIGIN_STARTS_STD_DEV<150
& ORIGIN_EVENT_HEURISTIC<3 
& ((BP_NOTATION=="<INS>" & TUMOUR_SUPPORT>25) | (BP_NOTATION!="<INS>" & TUMOUR_SUPPORT>5))
' 
sv_breakpoints.vcf > somatic.sv_breakpoints.lenient.bcftools.vc
helrick commented 1 year ago

Hi Walter, apologies for the delay in response. That is indeed the part of the code that applies the filters. I can't comment on the bcftools command as I haven't tested it, but from a quick glance it looks correct.

I've just released a newer version of SAVANA which should address the issue you raised and also provide the ability to custom filter your variants using a JSON file. The above method using bcftools should still work when applied on the raw sv_breakpoints.vcf file if you want to include more complex logic in your filters.

waltergallegog commented 1 year ago

Hi Hillary, thanks for the new release and the ability to use filters with the json file.

I tried to use the above bcftools command on the raw sv_breakpints.vcf file but I get an error as bcftools can not find the field ORIGIN_EVENT_HEURISTIC.

Has the name of the field changed, or has it been removed? Should I remove it from the command?

helrick commented 4 weeks ago

Hi there, apologies I missed your final question on this thread, but I can confirm that the field ORIGIN_EVENT_HEURISTIC has been removed yes.