ga4gh-beacon / beacon-v2

Unified repository for the GA4GH Beacon v2 API standard
Creative Commons Zero v1.0 Universal
22 stars 19 forks source link

Explanation on ERD #80

Closed anuradhawick closed 1 year ago

anuradhawick commented 1 year ago

Greetings,

In my understanding, we should be able to have cohorts defined under different criteria such as study, described in beacon or user-defined (I guess this is done using ontology terms).

In these scenarios, having a particular individual referenced across several cohorts is inevitable. However, from the ETD diagram, it seems the cohort-individual relationship has cardinality 1 -<> n or one to many. Could you kindly elaborate on this design aspect? I have attached the ERD for reference.

Thanks

ERD
mbaudis commented 1 year ago

@anuradhawick I'd say this is just ill defined. datasets and cohorts are 2 types of "collections"; datasets are "physical" groups (e.g. with common access, DAC, storage ... based on the variants) while cohorts can be flexible collections related to groupings of e.g. phenotype, diseases, studies...

We do not model all entity relationships for such parameters. E.g. a cohort may be assembled from members of multiple datasets - though such definitions are pushed out to cohort definitions beyond the Beacon model.

I'll adjust the cohorts<-->individuals to many-to-many:

``` mermaid classDiagram analyses <-- genomicVariations : 1..n runs <-- analyses : 1..n biosamples <-- runs : 1..n individuals <-- biosamples : 1..n runs <.. genomicVariations : 1..n biosamples <.. genomicVariations : 1..n individuals <.. genomicVariations : 1..n biosamples <.. analyses : 1..n individuals <.. analyses : 1..n individuals <.. runs : 1..n cohorts o-- individuals : m..n datasets o-- genomicVariations : 1..n class genomicVariations{ analysisId runId biosampleId individualId variation clinicalInterpretations caseLevelData ... } class analyses{ id runId biosampleId individualId analysisDate pipelineName aligner ... } class biosamples{ id individualId biosampleStatus sampleOriginType histologicalDiagnosis collectionDate ... } class individuals{ id sex diseases phenotypicFeatures ethnicity pedigrees ... } class runs{ id biosampleId individualId runDate librarySource libraryStrategy platform ... } class datasets{ id name description dataUseCondition info updateDateTime ... } class cohorts{ id name cohortType cohortSize cohortDataTypes cohortDesign ... } ```
anuradhawick commented 1 year ago

Thanks for the prompt response and explanation. I will close this issue now.