ga4gh-beacon / beacon-v2-Models

Models that leverage the Beacon Framework v2
Apache License 2.0
4 stars 7 forks source link

Puzzles in variant query parameters #114

Closed gsfk closed 2 years ago

gsfk commented 2 years ago

Two variant query parameter issues, possibly both errors in the description fields:

(1) The variantType query parameter appears to be described incorrectly in parts of the spec. In genomicVariations/requestParameters.variantType.description we have this claim:

"Either alternateBases or variantType is required, with the exception of range queries (single start and end parameters)."

There is similar text in genomicVariations/requestParametersComponents.AlternateBases.description:

"Categorical variant queries, e.g. such not being represented through sequence & position, make use of the variantType parameter.\n* either alternateBases or variantType is required."

In both cases this seems to be incorrect, variantType appears to be optional everywhere. It's explicitly marked as optional in several parts of the documentation, and a close reading of requestParameters.start.description also suggests it's not required. It's not even clear to me which of sequence query, range query or bracket query count as "categorical variant queries"

(2) There is a smaller issue with requestParametersComponents.AlternateBases.description (repeated in "ReferenceBases" in the same file): the description is contradicted by the "pattern" field: the description says accepted values are [ACGTN], but the pattern given is "^([ACGTUNRYSWKMBDHV\-\.]*)$" meaning any of the IUPAC codes can be used. My first guess is that the description is incorrect.

I'd be happy to file a PR with description fixes, if that's in fact the problem with all of these.

mbaudis commented 2 years ago

Reads correct to me, though

... which emerges etc. here http://docs.genomebeacons.org/variant-queries/

One reason for being permissive here is the frequent use of e.g. SNV as variant type - which is formally correct if used together with alternateBases but not needed; and itself insufficient (in contrast to "categorical" variations such as DUP, DEL).

Categorical variant queries, e.g. such not being represented through sequence & position

This may need a clarification about that the sequence (alternateBases) defines the non-categorical one... I guess these concepts correspond roughly to VRS Molecular Variation versus Systemic Variation.

That being said - we'll be happy to get help w/ refining this!

mbaudis commented 2 years ago

Closing this - requests for enhancements, clarifications should be made now against the beacon-v2 repo/documentation.