Open david4096 opened 8 years ago
@david4096 Logically, Callsets should only refer to one Variantset, since they can be thought of as an ordered list with the length == no. of variants described. So Variantsets would have to be identical, to be referred from Callsets.
Our group has this use case. Without starting a conversation on what the correct definition of a VariantSet is/should be, the CallSet variantSetId list makes sense if you follow exactly what the definitions suggest:
VariantSet definition:
A VariantSet is a collection of variants and variant calls intended to be analyzed together.
CallSet definition:
A CallSet is a collection of calls that were generated by the same analysis of the same sample.
Use case: compare CallSetX to other CallSets belonging to VariantSetA, and compare CallSetX to other CallSets belong to VariantSetB, but not all CallSets from both VariantSetA and B, since by definition VariantSetA and VariantSetB are not meant to be analyzed together. The CallSet can belong to VariantSetA and VariantSetB in order to avoid duplication of this CallSet.
By this I would say, CallSets belonging to one VariantSet in the reference server is a bug.
Also,
record SearchCallSetsRequest {
/**
The VariantSet to search.
*/
string variantSetId;
If above is the agreed upon definition, then variantSetId in the CallSetRequest should be variantSetIds list, not a string.
@jacmarjorie
I believe your use cases is what was imagined. However there is a multi-month discussion, that was never resolved, on if this should be supported:
https://github.com/ga4gh/schemas/pull/395
We would love contributions to the documentation on variants, including documenting use cases justifying the design:
https://github.com/ga4gh/schemas/issues/408 https://github.com/ga4gh/schemas/issues/379
Variants is suffering from no one who has a deep understanding of variants and VCF analysis owning finishing the work.
Mark
Jaclyn Smith notifications@github.com writes:
Our group has this use case. Without starting a conversation on what the correct definition of a VariantSet is/should be, the CallSet variantSetId list makes sense if you follow exactly what the definitions suggest:
VariantSet definition:
A VariantSet is a collection of variants and variant calls intended to be analyzed together.
CallSet definition:
A CallSet is a collection of calls that were generated by the same analysis of the same sample.
Use case: compare CallSetX to other CallSets belonging to VariantSetA, and compare CallSetX to other CallSets belong to VariantSetB, but not all CallSets from both VariantSetA and B, since by definition VariantSetA and VariantSetB are not meant to be analyzed together. The CallSet can belong to VariantSetA and VariantSetB in order to avoid duplication of this CallSet.
By this I would say, CallSets belonging to one VariantSet in the reference server is a bug.
— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub*
https://github.com/ga4gh/ga4gh-schemas/blob/master/src/main/proto/ga4gh/variants.proto#L75
Callsets are still allowed to be in multiple variant sets. We should remove this. The biosample ID tag on callsets is what allows you to compare calls in multiple variant sets.
Callsets can be a member of multiple variant sets according to the schema, yet the reference server is currently underspecified for this case. Is there an example of when a callset is in multiple variant sets?
https://github.com/ga4gh/schemas/blob/master/src/main/resources/avro/variants.avdl#L90
SearchCallSetsRequest
requires a single variant set ID to be specified, making the above semantics even more strange. If a callset can be a member of multiple variant sets, why do we specify a single variant set ID when performing search?https://github.com/ga4gh/schemas/blob/master/src/main/resources/avro/variantmethods.avdl#L158