Open teemukataja opened 5 years ago
@teemukataja In the current proposal, mateName
would be a specification for the end
position. A BND
with a specified mateName
would correspond to a translocation if on different chromosome.
description: |
Second chromosome for fusion events. This can be
* empty (no fusion or unknown partner)
* identical to `referenceName` (e.g. one side of an inversion)
* a different chromosome
IMO we don't need a separate mateStart
; just specifying that the chromosomes should be ordered (for search):
"reference_name" : "8",
"start_min": 128400000,
"start_max" : 129400000,
"mate_name" : "22",
"end_min" : 23250000,
"end_max" : 23280000,
(comments also on https://github.com/ga4gh-beacon/specification/pull/256#issuecomment-476106086).
@mbaudis Could you provide any example queries (e.g. POST or GET) and responses (JSON response) on how this functionality can be utilised? I could not find any in the issues or in the API specs.
I would like also to validate some assumptions:
alternateBases
or variantType
can be used with mateName
(seems like no);variantType=BND
is mateName
required or not (seems like no).@teemukataja SAee the example above, corresponding to an imprecise fusion event (e.g. a MYC-IGL translocation, variant Burkitt lymphoma). A precise query (which doesnt make much sense, since breakpoints are rarely recurring position-specific):
?referenceName=8&start=1289234404&mateName=22&end=23266044&variantType=BND
This would correspond to 2 lines in VCF, where the corresponding mate would be represented in the ALT and INFO fields:
#CHROM POS ID REF ALT QUAL FILTER INFO
8 1289234404 bnd_A C C]22:23266044] 6 PASS SVTYPE=BND;MATEID=bnd_B
22 23266044 bnd_B A [8:1289234404[A 6 PASS SVTYPE=BND;MATEID=bnd_A
The VCF contains additional information about the directionality of the fusion which we don't consider right now (not really important for query models but could be specified later on).
The following would be a typical variation of the query, in which we look for a fusion between canonical breakpoint regions using range matches (same genes):
?referenceName=8&startMin=128400000&startMax=129400000&mateName=22&endMin=23250000&endMax=23280000&variantType=BND
Current Beacon responses would be just standard. Since in example 2 multiple fusion events could be matched, we could deliver the different matched variants (in some TBD format) in the response (either through handover or in the response message - other discussion).
@teemukataja For BND
variant queries w/o a mateName
, all types of variants representing a structural sequence disruption could be queried. In our Beacon+ instance, we just match e.g. on the start and end positions of CNV
events; obviously BND
; possibly INS
...
https://github.com/ga4gh-beacon/specification/pull/256 added a new property called
mateName
as a parameter to a variant query. Is this new feature incomplete? ShouldmateName
be paired up with a coordinate to specify where in the mate chromosome the bonding happens?Looking at https://samtools.github.io/hts-specs/VCFv4.3.pdf chapter 5.4.4 page 20 for reference.
How would one write a
mateName
query? We would probably needmateStart
,mateStartMin
andmateStartMax
in addition to the newly created parameter.Queries would then look something like this for example: Using
referenceName
,start
,mateName
,mateStart
for1 : 1000 - 2 : 2000
or usingvariantType
as1 : 1000 > BND