Closed andrewrech closed 5 years ago
To provide some more background, the idea is to generate output as generated by CollectAllelicCounts for a pool of normals so that we can correct allelic biases in tumor-only. Would it be possible that CreateSomaticPanelofNormals is extended to cover the CollectAllelicCounts "special case"?
@samuelklee @davidbenjamin.
@andrewrech @lima1 Is it absolutely necessary to retain the full per-sample information, or would it be sufficient to add an INFO field (or several) with some sort of summary statistics? For example, I'm working on an improved Mutect2 panel of normals that emits the fraction of samples in which the artifact was called and the estimated beta distribution of the artifact allele frequency among samples containing the artifact. Would this or something related meet your needs?
Great, yes, summary would be sufficient. I currently extract the total number of alt and ref reads and the number of samples out of the old Mutect --normal_panel. The beta distribution would be great too.
This is now in PR: #5675. FilterMutectCalls is not yet hooked up to exploit any of this new information, but we will be testing ideas for that soon.
Amazing @davidbenjamin, thanks for this fast work!
Closed, I think, by #5675, but please let me know if any other outputs would be useful.
Will do, thank you again
@davidbenjamin, thanks again, finally had time to check this out.
The fit for the beta binomial includes homozygous germline variants when present, right?
Would it be possible to specify a filter to exclude say allelic fraction > 0.9? Ideally I would want the homozygous samples counted in the FRACTION field. Or would you ask for filtering for this upstream of CreateSomaticPanelOfNormals?
@lima1 The beta binomial fit ignores germline variantion. That is, if you have a variant that shows up sometimes as an artifact and sometimes as a germline variant, the tool fits only the allele fractions of the samples where it seems to be an artifact.
The FRACTION
field excludes germline variation. This is done intentionally because the -germline-resource
is a much more powerful tool for germline filtering than a panel of normals.
Feature request
Tool(s) or class(es) involved
CreateSomaticPanelOfNormals
Description
Currently, CreateSomaticPanelOfNormals emits sites-only VCFs. Some downstream tools require full VCFs, as could be created previously in the PON CombineVariants workflow.
Perhaps this feature will be covered when CombineVariants becomes available, but I believe it may still be desirable if CreateSomaticPanelOfNormals could pass
--sites-only-vcf-output=false
to allow full VCFs to be returned.This would permit calculation of mapping bias using allele frequencies of the normal samples.
Thank you for your tremendous service developing this tool.
Sincerely,
Andrew