Mrinal-Thomas-Epic commented 6 months ago

My understanding of qualifiers is that they are additional optional properties that add further filters in addition to matching the subject of the statement (i.e., the variant). However, there are two examples in the schema that seem to be inconsistent with my understanding.

In the gk-pilot schema for VariantPathogenicity statements, why are penetrance and geneContext considered qualifiers? These seem like they are adding extra information, but not information that should be treated as filters.
1. If its important to standardize terms across statements, would it make more sense to have these under a standard concept for object annotations/metadata?
In the gk-pilot schema for VariantOncogenicityStudy, why is tumorType a top-level property, as opposed to a qualifier? This information seems like it should be treated as an additional filter. In general, what makes something a qualifier vs top-level property?

ahwagner commented 4 months ago

In the gk-pilot schema for VariantOncogenicityStudy, why is tumorType a top-level property, as opposed to a qualifier? This information seems like it should be treated as an additional filter. In general, what makes something a qualifier vs top-level property?

For Variant Oncogenicity (or any statement related to a cancer) a tumor type is a required property, whereas qualifiers are optional.

In the gk-pilot schema for VariantPathogenicity statements, why are penetrance and geneContext considered qualifiers? These seem like they are adding extra information, but not information that should be treated as filters. If its important to standardize terms across statements, would it make more sense to have these under a standard concept for object annotations/metadata?

For VariantPathogenicity, I believe the indicated properties are optional and so were added to the qualifiers array.

FWIW I agree that this is confusing, and I do not recall the logic behind maintaining a qualifiers array; this might have been part a compromise approach to having anything that can serve as a qualifier be named <name>_qualifier. I would be okay with dropping the qualifiers key and having these properties sit at the top level.

ahwagner commented 4 months ago

Tagging @larrybabb for additional comments.

larrybabb commented 2 months ago

I'm linking this to the other qualifier issue that provides additional background.

134 Consider alternate mechanisms to define specialized qualifiers in Statement profiles

larrybabb commented 2 months ago

IMO qualifiers is an aspect of the model that is very difficult to represent abstractly. The idea of required vs optional qualifiers indicates to me the difference between definitional values and decorative values. As a general rule, when attributes rises to the level of being required on a class, they are definitional to the identity of that class. If optional attributes try to represent themselves as definitional or important enough that they are to be critical to downstream use, then they are likely missing the required value that should be their when not supplied or the class is not fully normalized and multiple concepts are collapsed into a single class.

Any and all of these cases will likely occur in the model (even in the standard models) because trying to represent these diverse set of statements and study results in a perfect normalized state is not reasonably achievable. One could argue we have been on that path for several years and still have no product to show for it. We will be forever on the journey to perfection yet never arrive. (sorry for my pontification)

larrybabb commented 2 months ago

@Mrinal-Thomas-Epic Please let me know if the responses here or in the issue #134 have resolved your query. If so, please consider closing this even if the other discuss is still open. And feel free to join the other discussion which is likely more active.

Mrinal-Thomas-Epic commented 2 months ago

Closing in favor of #134

mbrush commented 2 months ago

Hi all. Sorry it took me so long to weigh in here. Reopening this ticket just to make sure it gets seen – feel free to close once it is.

First, it is important to understand that in a VA Statement object, the semantics of the assertion put forth is explicitly represented using subject-predicate-object and qualifier properties. This approach is inspired by (and extends) RDF semantics used to created linked-open-data on the semantic web. Here, there is always one subject, one predicate, one object that make the core 'triple' in a statement. One or more qualifiers can be defined for a given statement type to further refine/extend/constrain the context of this core triple - to give the full meaning of the statement.

The point here is that qualifiers contribute to the semantics/meaning of the core assertion at the heart of a Statement object. Other properties in a Statement (e.g. date_authored, contributions, has_evidence), and other classes in the VA model (e.g. Method, Document, Agent, EvidenceLine), are there to provide additional information that support understanding and use of this core statement - mainly provenance and evidence metadata.

Also, some of the questions in this thread are based on how qualifiers were formerly structured in a separate nested object. Please familiarize yourself with the current way qualifiers are represented (per Larry's proposal here) - as context for the rest of this conversation. Briefly:

Qualifier properties will all have names that are appended with the term 'Qualifier' to make it clear that these provide qualifying information to the core S-P-O triple.
These properties now live the top level of a Statement alongside the subject, predicate, and object properties - the full meaning what is asserted at the core of a Statement can be easily read from this set of properties.

For example, the structure:

id: Statement001
subject: Variant X
predicate: is causal for
object: Condition Y 
modeOfInheritanceQualifier: autosomalDominant
penetranceQualifier: high

. . . which reads as the core assertion that "VariantX is causal for autosomal-dominant ConditionY with high penetrance"

With that background, below is the official documentation about qualifiers (not all of which made it into the va-spec yaml, but will make it into the RTD docs once they are up). Sorry if it is a bit much - throwing everything at you to so see what resonates.

A qualifier provides an additional piece of information in a Statement that extends or refines the meaning of the core subject - predicate - object 'triple' - by providing additional detail, or constraining the statement to apply in a particular context.
The qualifier attribute allows representation of more complex, n-ary statements that may not be accommodated by a simple subject-predicate-object (SPO) triple. For example, if a triple asserts that 'Variant X' - predicts sensitivity to - 'Treatment Y', a qualifier can be used to indicate that this applies in the context of a particular 'Disease Z'.
Qualifiers can also add information that 'quantifies' aspects of a Statement - e.g. for a Statement triple asserting that a 'Variant X'- causes - 'Phenotype Y', a qualifier can be used to add frequency/penetrance information that quantifies the percentage of carriers in which the phenotype is observed to manifest.
In practice, the core-im 'qualifier' attribute will always be specialized when defining a Statement Profile, to indicate specific types of qualifying information relevant to a given Statement type (e.g.disease_context_qualifier, or penetrance_qualifier).
Some Statements may require more than one specialized qualifier to express its full semantics - each capturing a different type of information that refines or extends the core SPO triple. For example, a TherapeuticResponseStatement with a subject 'Variation' and object 'Treatment' may define disease_context_qualifier and population_context_qualifier properties to represent the disease and population context in which the core SPO statement applies.
Finally, note that in a given Statement profile, some qualifiers may be required and other may be optional - this is up to the modelers who defined the profile model. e.g. in a TherapeuticResponseStatement, a 'diseaseContextQualifier` is a required property that must always be provided, while an 'alleleOriginQualifier' is optional - and provided only if available and deemed useful by the data creator. These are both qualifiers as they both add qualifying information to the core SPO triple reporting "Variant X predicts sensitivity to TreatmentY".

One key thing to clarify in response to the thread above is that being a qualifier is based solely on the type of information it provides - which must contribute additional detail or context to the core assertion being made in a Statement's SPO triple. Whether or not this information is 'required' or not isn't relevant to whether it represents a qualifier in the Statement.

I think this idea that qualifiers should be for optional information was introduced by Larry and Alex a while back, to support needs of the deprecated value object / descriptor paradigm. This idea persisted even as this paradigm was abandoned and other structures were proposed to structure qualifiers in a Statement. The original goal I think was to ensure that 'required' qualifying information stood out from 'optional' qualifying information, and was structured in a way that it could be formally declared as 'required' in the schema.

As noted, I disagree in principle with this basis for determining whether something is or is not a qualifier, but I also think that it is no longer needed to achieve what Larry and Alex were aiming for given the current way we are structuring qualifiers - as a flat list of properties whose names indicate they are qualifiers, and which are structured at the top level of a Statement alongside the subject, predicate, and object. In this context, it is simple enough to just declare which qualifier properties are required and which are not - to indicate to data creators and consumers which are essential to provide to make a complete statement. Qualifiers that are not needed but could provide additional detail when available can remain not required. We can see a concrete example of this in the PR I made for handling qualifiers in the TherapeuticResponseStatement profile here.

Finally, with respect to Mrinal's original question in this thread about qualifiers providing 'filters' - I think that whether a piece of info in a Statement is useful as a 'filter' in search is up to a given user/use case, and should have no bearing on whether that information is considered a 'qualifier' or captured in a qualifier property. That said, @Mrinal-Thomas-Epic I am keen to understand more about this practical use case, and if/how our modeling might support the underlying need that is being expressed here (which I don’t think I fully appreciate).

Mrinal-Thomas-Epic commented 1 month ago

Thanks for the clarification! I agree with defining qualifiers as additional information that refines the meaning of the SPO triple, rather than just optional fields.

I think my description of using them as a 'filter' is a subset of what you described (as you said "or constraining the statement to apply in a particular context"). The use case would be filtering out statements we can programmatically determine to be irrelevant when displaying a set of statements for a variant (e.g., in interpretation software or when displaying to clinicians).

ga4gh / va-spec

What makes something a qualifier? #126

134 Consider alternate mechanisms to define specialized qualifiers in Statement profiles