Closed sdelatorrep closed 5 years ago
@sdelatorrep but why does mateName need a separate component & cannot reference to Chromosome
? So the description of Chromosome
in components would be neutral/minimal, and only query.parameters.referenceName
and query.parameters.mateName
have a description of the usage. A mateName
object is just a chromosome (this would stay the same concept if we use other types of references).
The component could then just be:
Chromosome:
description: 'A chromosome of a reference assembly, to which base coordinates are being referenced.'
(Also getting rid of the multiple definitions of chromosome names, which lead to inconsistencies - the component's enum values are sufficient.)
@mcupak Thanks for the comments!
@mcupak ... regarding the naming: mateName
follows referenceName
style. Seems logical; however, I'm not married to the name.
Hi @mcupak, @mbaudis , it's not possible to do what you are suggesting. You cannot add the description
here:
mateName:
$ref: '#/components/schemas/MateName'
because it is ignored (description
or anything else, see this). This is what I tried to say in my first comment in this PR and this is why I created a new class and didn't reuse the Chromosome
one.
If we reuse the Chromosome
class we will have 2 different descriptions for the field mateName
:
and
(it's not possible to change the second one)
@sdelatorrep @mcupak So a parameter cannot have a description, if it uses a reference? Ditch Swagger.
Seriously, one has to be able to comment on the structure (i.e. in the component; here that the enum describes chromosomes) and the specific use (i.e. in the parameter, here describing what the chromosomes are being used for). We already stumble upon this, here e.g. in the multiple declarations of what a chromosome/reference ... is.
But if I look at the Swagger example, the description sits with the parameter (there responses.200.description
, not the component.
So the way here would be just to have a component Chromosome
, which is both referred to by referenceName
and mateName
, which have their (unique) description
values?!
IMHO this is a "tooling" problem (OpenAPI in this case) and we should look for the less bad option. Thus, in order to refine the scenario and do a more informed decision, let me add another question: @mbaudis are we solid that mateName would be identical to Chromosome, e.g. would we allow for using "MT" in that value?
@jrambla Correct interpretation (re: tooling). Yes, there shouldn't be a difference between mate name and reference. I have to admit that I haven't seen MT fusions, but logically there is no reason why they shouldn't be described (reference and mate can be the same, as in the description). So for now, I would prefer different parameters with their own descriptions using the same #ref , instead of having a #ref for each parameter (which totally defies the reason why we use components).
I'm not sure I understand you, @mbaudis. If we keep the same $ref
we will have different descriptions, depending on where you look at:
mateName
in the query parameters of the GET /query
endpoint (1st screenshot in my previous comment), or mateName
in BeaconAlleleRequest
class (used in the POST /query
endpoint) and we cannot change the description
in this case (2nd screenshot in my previous comment).Is this what you're saying? Or I missed/misunderstood something? Thanks!
@sdelatorrep I actually overlooked the second definition... But I guess there is no way around for having both referenceName
and mateName
described in both... The alternative would be to have no description at all for those in POST & GET (i.e. only
referenceName:
$ref: '#/components/schemas/Chromosome'
mateName:
$ref: '#/components/schemas/MateName'
), and have the descriptions with the (separate) components (as you seem to suggest).
While both options solve a different duplication, my problem with the 2nd is that the Chromosome
and MateName
components are conceptually the same (i.e. chromosome instances).
But I leave the decision to you..
(Btw: the Chromosome
component should be renamed to ReferenceName
whenever feasible; there are cases where other reference annotations are used - e.g. VMC uses sequence identifiers...)
My proposal:
Chromosome
component to ReferenceName
.ReferenceName
component.ReferenceName
in both parameters referenceName
and mateName
. parameters:
- name: referenceName
description: 'Reference name.'
in: query
required: true
schema:
$ref: '#/components/schemas/ReferenceName'
- name: mateName
description: |
Second chromosome for fusion events. This can be
* empty (no fusion or unknown partner)
* identical to `referenceName` (e.g. one side of an inversion)
* a different chromosome
in: query
required: false
schema:
$ref: '#/components/schemas/ReferenceName'
...
components:
schemas:
ReferenceName:
description: 'Reference name.'
type: string
enum:
- ...
...
BeaconAlleleRequest:
...
properties:
referenceName:
$ref: '#/components/schemas/ReferenceName'
mateName:
$ref: '#/components/schemas/ReferenceName'
And it would look like this in an OpenAPI viewer:
GET /query
BeaconAlleleRequest
:
In the second defintion of mateName
, the description is different but, at least, generic... don't know if it's the best choice, though, but the only alternative is duplicating the component which has the drawbacks @mbaudis exposed.
Let's see what @mcupak says about this now that it's clear what the problem is.
@sdelatorrep +1 in this form :-)
Ok. Let me state that eventually, being this an spec, I will prefer clarity (two components with clear descriptions) over being concise (one component with not so clear descriptions or less readable ones). But, given that we have not reached a consensus yet, let's do this in a parsimonious way, changing one step at a time (release).
1) As we only have a component now (i.e. Chromosome), let's use only one. As per Sabela's last proposal. 2) BUT, as it is already named "Chromosome", and I don't want to introduce changes one at a time, and I believe that using referenceName in 3 different places is quite confusing, I'll not change the name right now. 3) If that solution of sharing the component proves not clear enough to the community, we should go in the direction of having two components instead of one. We will keep "Chromosome" in the "old spec" and use "referenceName" and "mateName" in the new 4) I will ask Sabela to change "mateName" (not clear enough IMHO) to "mateChromosomeName" (I'll like to use "referenceChromosomeName" also, but "referenceName" has been there from the beginning for reasons that I don't really understand).
Hoping that this reaches consensus enough and allows us to move forward.
@jrambla @sdelatorrep The problem with using "chromosome" (anywhere) is that chromosomes are just a subset of the set of possible genome references. The concept is already being stretched by "MT"; but - importantly - many coordinate systems use positions relative to another reference (whatever clone, transcript ...id). The VMC specification explicitly refrains from using chromosome, but states
sequence_id: An id mapping to the Identifier of the external database Sequence
In VCF it is called CHROM
, but may be used differently - a scenario not being served from Beacon right now:
CHROM - chromosome: An identifier from the reference genome or an angle-bracketed ID String (“<ID>”) pointing to a contig in the assembly file
Now, I much prefer the "chromosome from a reference genome edition" way to express coordinates, with resources taking care of the mappings. But something like the semantic abuse in VCF would drive me mad. So either we really say "Beacon is for reference genome mapped data, so there be chromosomes", or we try to get rid of the "chromosome" concept in attribute names etc. and do something like
"referenceName" :
"description" : |
The identifier of a genomic sequence assembly to which coordinate positions refer. It is recommended to use chromosomes of a reference genome assembly."
...
@jrambla @sdelatorrep ... so if going the "chromosomes only" way, "mateChromosomeName" would be an (expressive) option. But it's a lock-in into "chromosome space" you'll never get rid of.
I have similar concerns about this as expressed by @mbaudis. With the new Use Cases in Plants and Proteomics being worked on I would like us not to tie into chromosomes only.
@mbaudis, @garysaunders we are currently tied to human chromosomes, as we are using an enum. Neither contigs, nor any other "artifact" would accepted. Thus, we can't honor your concerns without reconsidering the whole "reference system" in the Beacon. Indeed, we should probably look at if refget "principals" are applicable here. Beacon v2,0 could be a good target for that. I'm not against that, but as I've said, lets introduce changes step by step. Lets not block the spec release and let it evolve.
@jrambla @garysaunders @sdelatorrep So neither the use of "chromosome" nor the enum (not a good idea, but ...) will block non-human use cases ; it will just lead to painful "please give me "Z" etc. issues. Proteins will need a different solution anyway (other request type, I guess).
So best compromise IMO to go ahead with mateName
(similar to referenceName
; more verbosity isn't really necessary since description
etc.), and tie this for now to the Chromosome
component. This will keep the general syntax in line; if references become something very different, this may then later be added as either new attributes or different components.
But if @jrambla really wants mateChromosomeName
he can have it :-)
+1 to @mbaudis "So best compromise IMO to go ahead with mateName (similar to referenceName; more verbosity isn't really necessary since description etc.), and tie this for now to the Chromosome component."
Hi @garysaunders ! This PR is stalled since quite a long time waiting for Michael and others to review it. Please, could you help to chase them? Thanks!
Ouch. Done.
@mbaudis @sdelatorrep
Now that we have a mateName
property, do we also require a mateStart
property (and mateStartMin
and mateStartMax
to cover a wider area)?
Looking at https://samtools.github.io/hts-specs/VCFv4.3.pdf chapter 5.4.4 page 20.
A fusion variant should probably have the mate's coordinate as well.
How would one search for a fusion variant? What search terms are required?
e.g., referenceName
, start
, mateName
, mateStart
for
1 : 1000 - 2 : 2000
or
variantType
in
1 : 1000 > BND
for any mate
New PR to replace #176
I've created a new type called
MateName
because when using$ref
in some locations of the definition, the description cannot be overridden and it would be the same as in typeChromosome
.vs