/biosamples response is missing a datasetID field

ga4gh-beacon / specification-v2

GA4GH Beacon v2 specification.

Apache License 2.0

3 stars 6 forks source link

/biosamples response is missing a datasetID field #67

Closed Tom-Shorter closed 3 years ago

Tom-Shorter commented 3 years ago

There is currently no way to identify which dataset an individual biosample came from, /individuals and /g_variants both have a datasetID field within the response object for individual results so it seems an oversight that it is missing from /biosamples.

mbaudis commented 3 years ago

That is probably just an oversight, but in the end related to https://github.com/ga4gh-beacon/specification-v2/issues/63. So one would wrap datasetAlleleResponses for different datasets, but for the biosamples or individuals then indicate their datasetId per item? Then we need this for the VariantInSample and any other object which can end up in response.results, too.

Or responses are always for a single dataset, with the possible exception of a generic response as in the old days.

Or in response.results are always lists of datasetResponse objects.

mbaudis commented 3 years ago

... also, to have a datasetId in a given biosample, individual, variantInSample response is ugly; it is NOT part of the native object schema and has to be overloaded for the response. So if a /biosamples response contains data from multiple datasets those should be wrapped in dataset specific objects.

jrambla commented 3 years ago

The rationale behind the model is: "we have two types of Beacons: evidence/knowledgeBase and Case/individuals". The first one is represented by Dataset>Variant, the second one by Dataset>Individual (extended later with Cohort>Individual. That is the reason that no other entity is having a DatasetId on it. Michael's case is where the Beacon is made by samples from cell cultures (let's say) and it doesn't fit to one or the other. As per our initial exploration that doesn't seem to be a "majority" case, and I have no feeling of how many other case we'll find. Hence, we suggested Michael to extract the attributes from the individual from the samples and create these "virtual" individuals. Which I believe could make sense anyway, as thus you can relate samples from the same individual and such. datasetAlleleResponse should be deprecated if it is not already.

mbaudis commented 3 years ago

@jrambla That's not what is meant here? This was about identification of a biosample <-> dataset relation.

In any case, it gets a bit confusing; but IMO this should be resolved through #68 (which I see as the most logical solution, until @jrambla tells me why this wouldn't work ...).

sdelatorrep commented 3 years ago

Hi @Tom-Shorter. I think this is solved with the resultSet wrapper, right? If not, please, reopen this issue. Thanks!