Closed gavinmonahan closed 12 months ago
There are also a few SVs below 50bp, mostly >40bp, in my first batch. Is it possible to include an optional -SVminSize
flag for AnnotSV?
Hi @gavinmonahan
Thank you for all the feedback 😄
In response to your points:
This was an oversight on my part, forgot to come back to it. Sorry! Will add an optional parameter to allow you to choose either full, split, or both. I found specifying both resulted in bulky and hard to read files, given the scale of annotations provided by AnnotSV. Do you think it would it be worthwhile to have split and full annotations in separate files or better to just leave in one?
Have investigated this (and Jasmine) but in our attempt to maximise sensitivity for rare traits we’ve limited our ability to merge multiple samples effectively, as a trade-off. Currently, we provide a merged VCFs for each sample with 3 genotype/sample columns, so you are able to explore edge-cases where there’s no consensus between callers. Each of those sample columns represents the genotype info from each caller. This makes merging multiple individual’s VCFs tricky. The only way around this that I see is to merge VCFs at the caller level with Jasmine, which would create a very bloated cohort VCF, depending on the size of your cohort. We were going for broad application with the first iteration of this workflow and aware that for data processing users wouldn’t necessarily be running the workflow on distinct cohorts, but rather run it progressively on individual samples and/or in batches of varying sizes and numbers of cohorts, so decided to leave things like cohort merging and filtering to downstream work. We are discussing a downstream workflow focused on the cohort level that would handle filtering and prioritisation. Let’s chat about this, I’ll email you.
Same as the point above. Workflow is focused at sample-level for sake of sensitivity and broad application. Running Manta at the cohort level would only give you Manta variants and exclude the other callers. Also important to note there's a lack of standardisation among SV caller developers about VCF file formatting. That makes merging very challenging (hence the need for tools like Jasmine).
Thanks Georgie!
All very good points and I would be happy to chat about it soon 😀 I can see how having too many samples with so many callers will make the cohort VCF really large. A happy middle could be running it on a per-family basis, for example we usually have singletons or trios. Previously, for annotSV I have found the split can be confusing without the full for large (multigene) SVs, so I used 'both' to keep them together before filtering them down. I agree that the outputs are way too bloated, so having them as seperate files could be usefull too, or just the split annotation alone. I forked the repo last week and made some of those changes, including for annotSV. Although my experience with netflow is non existent it seemed to work so let me know if you want to merge it back to main.
Hi @gavinmonahan,
Made a few changes following your feedback/suggestions:
--annotsvMode
) to allow you to specify either full,split,both annotation mode. Also changed what was previously --annotsv
flag to --annotsvDir
. --extraAnnotsvFlags
) to allow you to specify any other permitted AnnotSV flag. Can use this to apply -SVminSize 40
. Will do this for SURVIVOR merging step too. Want to give them a go and let me know what you think? 👀
Hi @georgiesamaha,
That looks great! I haven't had a chance to run it yet with these changes but I think they are very useful changes. I'll let you know if I have any comments/issues ASAP 😊
Hi! I'm running this pipeline on Setonix and I'm enjoying it so far 😀 Not so much an issue but I had a few things I wanted to mention/request regarding the pipeline/AnnotSV output, mostly based on my experience using manta + AnnotSV -
--bam
(joint diploid analysis) rather than--normalbam
which I believe is for tumor analysis.Thanks for making such a fast, comprehensive, and easy to use pipeline! Cheers, Gavin 😊