fusion query with donor and acceptor parameters

costero-e commented 2 months ago

As beacon is still falling in short to make comprehensive fusion queries, me and my colleagues have found a possible solution which I present here to be discussed for the Variants Scout. The proposal is based on the usual language used in bioinformatics to refer to translocated genes, which add both the initial and the transferred chromosomes and the starting and ending positions for both chromosomes. I have found that nomenclature is usually like this: CHR:donorStart-donorEnd:CHR:acceptorStart-acceptorEnd Having said so, in our opinion, beacon should have brand new names for these donor and acceptor position parameters but could reuse referenceName and mateName for the origin and end chromosomes, respectively. A definitive beacon query could look like the one I am adding next: g_variants?referenceName=11&mateName=12&donorStart=[16086165,16086170]&donorEnd=[16086171,16086175]&acceptorStart=[16090071,16090073]&acceptorEnd=[16090074,16090075] We would like to have new parameters like donorStart or acceptorStart because of not having a misuse of the original start and end parameters, which are conventional and could make things more confused for implementers as well as for beacon users and clients. Let me know what you think. Best, Oriol

mbaudis commented 2 months ago

@costero-e I appreciate this as a push to advance w/ fusions but IMO this adds unnecessary complexity/arguments when talking about the query:

a fusion consists of 2 partners
the fusion partners might be breakpoints on different or the same chromosome
the positions on the given reference sequences (chromosomes...) are frequently "fuzzy" (cytobands, empirical)

For the detection of the 2 fusion partners this leads to the basic requirements of querying 2 chromosomes with associated ranges. This already can be achieved with the current parameters:

referenceName + start[0] + start[1]
mateName + end[0] + end[1]
variantType ... (e.g. SO:0000806 "fusion")
conventions
- mateName >= referenceName in sort order
- if mateName == referenceName: end[0] > start[0] (which is the usual convention but does not apply if mateName ≠ referenceName)

In essence this is a BeaconBracketQuery where the end bracket's usually different chromosome is denoted by mateName.

This seems very straightforward with a caveats:

it is up to the resource implementers if the combination of these 2 breakpoints with "fusion" type is checked for an explicit joint between the breakpoints; and the Beacon specification should not force a method here
we only provide a "single event option", e.g. not combinations for 3-way fusions etc. (over the top, but can be discussed)
beacons may/should provide single sided queries but this is like a current range query with "fusion" type so doesn't need a particular solution

jrambla commented 2 months ago

Hi!

I generally doesn't recommend using parameters (or columns in a table or alike) for other purposes than the originally envisioned, as, in the midterm, this overloading ends up in issues when evolving any of the two usages: the original or the added one, or makes validations and documentation harder and less intuitive.

Therefore, using the start and end for positions in different chromosomes will not be recommended. Neither adding complex validations on if mateName and referenceNames are higher or lower than the other. Adding more parameters when they add clarity, as in this case, would be my strong recommendation.

mbaudis commented 2 months ago

@jordi Well, the meaningful parameters one could think about would be mateStart and mateEnd and define a fusion request as a double RangeQuery referenceName + start (single) + end (single) + mateName + mateStart (single) + mateEnd (single) + variantType. This would be a bit more verbose (same number of values but more parameters).

However, the conventions of e.g. mateName >> referenceName are a given since this is how any fusion annotation works; lower chromosome first.

I guess I could support adding those 2 parameters even if they do not provide an additional solution over the currently available ones since they provide clearer labels (indicating the position on the 2nd fusion partner as end isn't really correct).

Re: Oriol's @costero-e suggestion: It doesn't make sense to name something "donor" or "acceptor"; and also we only need 2 positional parameters per fusion partner (to indicate the range of the breakpoint).

And independent of all that we will need to discuss if cytoband queries should be supported by front end / helpers or through the API (i.e. with additional cytoband strings; interestingly VRS allows CytobandInterval.

ga4gh-beacon / variant-query-types

fusion query with donor and acceptor parameters #1