Open costero-e opened 2 months ago
@costero-e I appreciate this as a push to advance w/ fusions but IMO this adds unnecessary complexity/arguments when talking about the query:
For the detection of the 2 fusion partners this leads to the basic requirements of querying 2 chromosomes with associated ranges. This already can be achieved with the current parameters:
referenceName
+ start[0]
+ start[1]
mateName
+ end[0]
+ end[1]
variantType
... (e.g. SO:0000806
"fusion")mateName
>= referenceName
in sort ordermateName
== referenceName
: end[0]
> start[0]
(which is the usual convention but does not apply if mateName
≠ referenceName
)In essence this is a BeaconBracketQuery where the end bracket's usually different chromosome is denoted by mateName
.
This seems very straightforward with a caveats:
Hi!
I generally doesn't recommend using parameters (or columns in a table or alike) for other purposes than the originally envisioned, as, in the midterm, this overloading ends up in issues when evolving any of the two usages: the original or the added one, or makes validations and documentation harder and less intuitive.
Therefore, using the start and end for positions in different chromosomes will not be recommended. Neither adding complex validations on if mateName and referenceNames are higher or lower than the other. Adding more parameters when they add clarity, as in this case, would be my strong recommendation.
@jordi Well, the meaningful parameters one could think about would be mateStart
and mateEnd
and define a fusion request as a double RangeQuery referenceName
+ start
(single) + end
(single) + mateName
+ mateStart
(single) + mateEnd
(single) + variantType
. This would be a bit more verbose (same number of values but more parameters).
However, the conventions of e.g. mateName
>> referenceName
are a given since this is how any fusion annotation works; lower chromosome first.
I guess I could support adding those 2 parameters even if they do not provide an additional solution over the currently available ones since they provide clearer labels (indicating the position on the 2nd fusion partner as end
isn't really correct).
Re: Oriol's @costero-e suggestion: It doesn't make sense to name something "donor" or "acceptor"; and also we only need 2 positional parameters per fusion partner (to indicate the range of the breakpoint).
And independent of all that we will need to discuss if cytoband queries should be supported by front end / helpers or through the API (i.e. with additional cytoband strings; interestingly VRS allows CytobandInterval.
As beacon is still falling in short to make comprehensive fusion queries, me and my colleagues have found a possible solution which I present here to be discussed for the Variants Scout. The proposal is based on the usual language used in bioinformatics to refer to translocated genes, which add both the initial and the transferred chromosomes and the starting and ending positions for both chromosomes. I have found that nomenclature is usually like this:
CHR:donorStart-donorEnd:CHR:acceptorStart-acceptorEnd
Having said so, in our opinion, beacon should have brand new names for these donor and acceptor position parameters but could reuse referenceName and mateName for the origin and end chromosomes, respectively. A definitive beacon query could look like the one I am adding next:g_variants?referenceName=11&mateName=12&donorStart=[16086165,16086170]&donorEnd=[16086171,16086175]&acceptorStart=[16090071,16090073]&acceptorEnd=[16090074,16090075]
We would like to have new parameters like donorStart or acceptorStart because of not having a misuse of the original start and end parameters, which are conventional and could make things more confused for implementers as well as for beacon users and clients. Let me know what you think. Best, Oriol