ga4gh-beacon / variant-query-types

Schemas for genomic / molecular variation queries over the Beacon protocol and beyond
Creative Commons Zero v1.0 Universal
2 stars 0 forks source link

fusion query with donor and acceptor parameters #1

Open costero-e opened 2 months ago

costero-e commented 2 months ago

As beacon is still falling in short to make comprehensive fusion queries, me and my colleagues have found a possible solution which I present here to be discussed for the Variants Scout. The proposal is based on the usual language used in bioinformatics to refer to translocated genes, which add both the initial and the transferred chromosomes and the starting and ending positions for both chromosomes. I have found that nomenclature is usually like this: CHR:donorStart-donorEnd:CHR:acceptorStart-acceptorEnd Having said so, in our opinion, beacon should have brand new names for these donor and acceptor position parameters but could reuse referenceName and mateName for the origin and end chromosomes, respectively. A definitive beacon query could look like the one I am adding next: g_variants?referenceName=11&mateName=12&donorStart=[16086165,16086170]&donorEnd=[16086171,16086175]&acceptorStart=[16090071,16090073]&acceptorEnd=[16090074,16090075] We would like to have new parameters like donorStart or acceptorStart because of not having a misuse of the original start and end parameters, which are conventional and could make things more confused for implementers as well as for beacon users and clients. Let me know what you think. Best, Oriol

mbaudis commented 2 months ago

@costero-e I appreciate this as a push to advance w/ fusions but IMO this adds unnecessary complexity/arguments when talking about the query:

For the detection of the 2 fusion partners this leads to the basic requirements of querying 2 chromosomes with associated ranges. This already can be achieved with the current parameters:

In essence this is a BeaconBracketQuery where the end bracket's usually different chromosome is denoted by mateName.

This seems very straightforward with a caveats:

jrambla commented 2 months ago

Hi!

I generally doesn't recommend using parameters (or columns in a table or alike) for other purposes than the originally envisioned, as, in the midterm, this overloading ends up in issues when evolving any of the two usages: the original or the added one, or makes validations and documentation harder and less intuitive.

Therefore, using the start and end for positions in different chromosomes will not be recommended. Neither adding complex validations on if mateName and referenceNames are higher or lower than the other. Adding more parameters when they add clarity, as in this case, would be my strong recommendation.

mbaudis commented 2 months ago

@jordi Well, the meaningful parameters one could think about would be mateStart and mateEnd and define a fusion request as a double RangeQuery referenceName + start (single) + end (single) + mateName + mateStart (single) + mateEnd (single) + variantType. This would be a bit more verbose (same number of values but more parameters).

However, the conventions of e.g. mateName >> referenceName are a given since this is how any fusion annotation works; lower chromosome first.

I guess I could support adding those 2 parameters even if they do not provide an additional solution over the currently available ones since they provide clearer labels (indicating the position on the 2nd fusion partner as end isn't really correct).

Re: Oriol's @costero-e suggestion: It doesn't make sense to name something "donor" or "acceptor"; and also we only need 2 positional parameters per fusion partner (to indicate the range of the breakpoint).

And independent of all that we will need to discuss if cytoband queries should be supported by front end / helpers or through the API (i.e. with additional cytoband strings; interestingly VRS allows CytobandInterval.