Clinical-Genomics / BALSAMIC

Bioinformatic Analysis pipeLine for SomAtic Mutations In Cancer
https://balsamic.readthedocs.io/
MIT License
44 stars 17 forks source link

Upload soft-filtered variants to Scout? #1256

Open mathiasbio opened 10 months ago

mathiasbio commented 10 months ago

Is your feature request related to a problem? Please describe.

In a validation of the GMS Lymphoid panel there's been a few variants in the reference samples that were filtered out in the VCF due to presence of the variant in the normal. This is done by VarDict itself, setting the Germline filter and which is then filtered out by bcftools further downstream in the analysis.

scatter_plot_briefskink

Some of these variants in the reference sample had AF_N and AF_T at similar levels at around 0.1, another had Tumor af = 0.6359, Normal af = 0.1264.

In the email-discussion of these validation-results, there was some questions raised on how we set these germline-filters, and some concern regarding the risk of filtering out some real and interesting somatic variants due to Tumor In Normal Contamination, and other factors.

It was discussed if we should try to avoid actually filtering on the AFs, and just annotate and then let them filter themselves in Scout, or if we could just keep variants tagged as germline if they also had the ACMG status of Pathogenic or Likely-pathogenic.

I think this sounds pretty intriguing! Generally I don't think there's a lot of variants marked as Pathogenic or Likely-pathogenic so including them in the final VCF regardless of presence in the normal would be a nice way to avoid the risk of filtering out these variants.

I was thinking further if we could extend this idea to keep all these variants in the final VCF regardless of any filter. But in that case perhaps there would be too many artifacts included in Scout, but if it's possible to upload these variants with some soft-filters like "Poor qual / Presence in germline DB" I think it could be nice.

Describe the solution you'd like

To avoid modifying the bcftools filters too much, could we extract these type variants early on after annotation into a separate VCF which we don't filter, and then in the creation of the final clinical / research VCF, just merge these in?

Describe alternatives you've considered If possible, a clear and concise description of any alternative solutions or features you've considered.

Additional context If possible, add any other context or screenshots about the feature request here.

Expected output for the feature If possible, an example of expected output

Current BALSAMIC version balsamic --version 12.0.2

mathiasbio commented 10 months ago

Based on further discussions with customers involved in the validation of the lymphoid panel, I have come to understand their needs and opinions a bit further. In summary it seems that:

But I wonder if this issue can be divided in two...it seems for instance that this is about two types of variants:

  1. Relevant germline variants that they want to be able to find
  2. Somatic variants which they don't want to lose due to TINC

It seems that for the first couldn't we just upload to Scout germline calls specifically from the normal? (which I think we're already producing)

For the second, we may just remove the VarDict "Germline" filter and upload all T/N variants from VarDict to Scout.

mathiasbio commented 10 months ago

After speaking to the customer a bit further, I learned that:

pbiology commented 8 months ago

We have to make sure to not upload too many variants for all cases as this will not scale in Scout. The backend won't be able to deal with it and uploads might take eve longer than today slowing down our production flow.