ga4gh / vrs

Extensible specification for representing and uniquely identifying biological sequence variation
https://vrs.ga4gh.org
Apache License 2.0
80 stars 34 forks source link

Clarification on reference allele normalization policy #468

Closed Mrinal-Thomas-Epic closed 8 months ago

Mrinal-Thomas-Epic commented 8 months ago

In the VRS 1.x docs, there is a section describing why reference alleles are not normalized. The section says,

"When the Allele refers to a reference state (case 1), trimming would reduce the variant to a null change. However, reduction to a null state would make it impossible to refer to a specific span of reference sequence. In order to permit users to refer to spans of reference sequence, VRS does not require normalizing reference agreement Alleles."

It seems like it should be possible to modify the normalization algorithm to detect the reference allele case, and perform the shuffle left/right step as if the variant was a deletion. Is there a reason this approach was not chosen?

ahwagner commented 8 months ago

Hi @Mrinal-Thomas-Epic!

Yes, this was discussed a long time ago. I had also advocated for this approach, and was asked to prove that changes would not impact non-reference alleles. I put in the time to do so, the notebook is here, and the approach is technically sound.

In the end, the argument that @reece made was that when describing reference alleles, the intent of ref-agree calls is different than variant calls, and for ref-agree we should not assume that ambiguity correction is needed or desired. As in all cases where the decision was not obvious, @reece @larrybabb and I presented to the community, assessed the arguments and community feedback, and then documented our rationale for the majority opinion in the spec; in this case, the majority opinion was that ref allele expansion should not be the default normalization behavior.

I wish I had a more satisfying answer. If you have some example data that demonstrates how adjusting the ref-agree behavior in the normalization algorithm would be beneficial to you, I am open to revisiting this decision for VRS 2.x.