Clinical-Genomics / scout

VCF visualization interface
https://clinical-genomics.github.io/scout
BSD 3-Clause "New" or "Revised" License
147 stars 43 forks source link

Filtering by 'Qual' on cancer variants page #4510

Open cogrho opened 3 months ago

cogrho commented 3 months ago

Hi! Is it possible to filter the cancer variants by 'Qual' to, for example, obtain only 'PASS' variants or filter out cancer-germline variants? Alternatively, can the 'Qual' field be sorted so that 'PASS' variants appear at the top of the page?

Thanks!

northwestwitch commented 3 months ago

I'm not really sure how to achieve this, since often VCF files are merged from different VCF files created by dfferent callers, and for the same variant some callers might have quality pass and some other caller no.. 🤔

dnil commented 3 months ago

Each variant line would normally have a single "FILTER" statement (say e.g. PASS or LowQualityFiltered), and this is what is also reflected in the PASS tags we display per variant.

I'm not sure what your pipeline or location is, or use case, but the way we usually do it with most pipelines we run in Stockholm is the VCFs are hard filtered to a quality status that the analysis is comfortable with based on pipeline validation. Variants also receive a bonus/malus in the ranking system, so that lower quality variants within the soft bounds are found slightly lower than high quality ones, all other things equal. Variants are then triaged as if "true" initially, based on consequence, rank, phenotype etc, and at a point during that triage a mix of quality indicators, e.g. GQ values from the variant format field together with other information like allele depth, total locus depth, region call quality (e.g. is this in a segmental duplication region), the IGV picture (is this a noisy region?) etc. There seems to be little use in loading variants that would never be eligible to be considered true, and conversely to treat them separately?

Please explain your use case a bit and we'll see what we can do! I understand this was intended for some cancer pipeline; if so which one?

cogrho commented 3 months ago

In our clinics, we have a custom somatic variant calling pipeline where we merge VCF files from three callers and set FILTER='PASS' in the merged VCF file for variants identified by two or more callers. Additionally, our merged VCF file includes 'germline' filter and filters for low-quality variants. Sometimes, variants flagged as low-quality prove to be true positives upon further inspection, so we prefer soft filtering. Therefore, it would be great to have an option to display only those variants with a green 'PASS' tag in the 'Qual' column, while hiding variants with a yellow 'GERM' tag. This would make the filtering process more transparent for us.

dnil commented 3 months ago

Thank you for the details!

We will have a look, it should be very possible to add a qual filter option - perhaps a checkbox.

A couple of points though for why we have chosen a different route.

First, strongly consider ranking variants. We completely agree on the soft filtering, within limits. If something is just bad, it is. Or too common. Or just too germline, if you are looking at your somatic variants list, and have a germline analysis for the same patient just waiting anyway. Consider removing those way off the charts, that could not be clinical. The rest, sure, but rank them as well, mandatory if you are looking at wgs or panels above a few hundred genes, but also useful for small panels sometimes.

Second, the way you describe multiple callers (qualitative consensus) is something that would bring high specificity given that the callers are very different (orthogonal), and mostly bias if they are not. The way to achieve sensitivity in e.g. SV calling is to include multiple callers, and evaluate them on their own merits. Is you/your developer using the "SET" or "FOUND_IN" INFO tags in the VCF so you get nice badges on each variant? Do you use callers we do not yet support? Consider an issue or PR for that!

cogrho commented 3 months ago

A checkbox or a dropdown menu would be great, thanks!

Thank you for the suggestions! We sometimes really end up with a lot of variants.. If I may ask, what kind of ranking system do you use for prioritisation? We also evaluate each caller separately, and we indeed use callers that are not yet supported - Strelka2 and LoFreq.