Open larrybabb opened 2 weeks ago
hmm - I actually favor the solution you implemented (make StudyResult.sourceDatasets
an array of DataSets . . . and if everything in the StudyResult was derived form a single DataSet, then there will be only one member of this array).
I don't think your solution 2 above is right, because the use case for allowing multiple values here isn't to track a linear trail of 1:1 derivations, as your comments imply. The idea here is that InformationEntities can be derived from multiple direct 'source' InformationEntities. e.g a CAF StudyResult may include data about its focusAllele that was pulled from two distinct DataSets produced by a given study.
I don't think your solution 1 is right because the sourceDataSet
property is about the derivation of information content found in a StudyResult, not about specific concrete serializations of the the StudyResult (which is what the RecordMetada object is for).
@mbrush the
InformationEntity.derivedFrom
is an unordered array ofInformationEntity
s. However, theStudyResult.sourceDataset
appears to be designed to override the array nature of it's parentderivedFrom
property and make it aDataSet
(not an array of DataSets).I have referenced this here in the CohortAlleleFrequencyStudyResult schema (which is a direct copy of the
sourceDataSet
from theStudyResult
).While I get the idea of using
derivedFrom
as a representation of the dataset from which the StudyResult was attained, I think we need to weigh whetherRecordMetadata
typeI'm in favor of #2.
For now, I am going to make StudyResult.sourceDataset an array of DataSet types and assume folks will only put 1 entry in the array. But this is not a reasonable final solution IMO.