Closed hepcat72 closed 5 years ago
Hi, you're right that S0.vcf contains the "ground truth" structural variants, a.k.a. the ones that we know are true. Our S0.vcf is derived from experimentally validated SVs, or from alternate computational merge algorithms. S0.vcf is necessary for FusorSV to know which SV calls from which callers it can trust when it is performing a merge with the input callers' VCFs. According to the readme, you would organize your VCFs for FusorSV like so: • vcfFiles/sample1/sample1_S11.vcf • vcfFiles/sample1/sample1_S10.vcf • vcfFiles/sample1/sample1_S4.vcf • vcfFiles/sample2/sample2_S11.vcf • vcfFiles/sample2/sample2_S10.vcf • vcfFiles/sample2/sample2_S4.vcf S0.vcf is provided for each of your input samples, if you have a ground truth, like so: • vcfFiles/sample1/sample1_S0.vcf • vcfFiles/sample2/sample2_S0.vcf As for insertions, we are currently working on FusorSV's insertion code, and we hope to have an update soon.
I haven't tried the software out yet, but I'm planning on trying it. I was just reading the documentation and I saw a few references to
S0.vcf
, but I did not find an explanation of what it is in the readme, paper, or a search of the code. Maybe I overlooked something. But, reading between the lines, I'm guessing it's a vcf file containing "true" structural variants. If that's correct, what are the guidelines for creating this file? How much is accuracy affected if it is not provided?And while I'm asking, what about large insertion detection? The paper seems to focus on DEL, INV, & DUP. I've been using anise_basil for detection of large insertions. Do you have any plans for handling evidence for large inserts, like accumulations of reads with unmapped mates and soft-clipped reads?