HumanCellAtlas / dcp-community

HCA Data Coordination Platform community content
5 stars 18 forks source link

RFC: HCA DCP Bundle Types and Definitions #93

Closed malloryfreeberg closed 3 years ago

malloryfreeberg commented 5 years ago

Please note: the contents of #86 were merged into this PR; please refer to the discussion in that PR for additional detail and comments addressed from reviewers.

NoopDog commented 5 years ago

It may be more useful to tag bundles by the process that created them rather than the type of data they contain. The type of data can be inferred from the process and the process is more descriptive.

I have a few more notes about this here:

https://github.com/HumanCellAtlas/dcp-community/pull/86#issuecomment-530276016

kislyuk commented 5 years ago

@NoopDog how would this tagging look like in practice, and what would a subscription query look like to specifically match such bundles?

diekhans commented 5 years ago

It is unclear why there are two RFCs related to bundle (#86 and #93).

They are not clearly delineated into covering different aspects of bundle types. I believe we would be better off merging these two.

NoopDog commented 5 years ago

@NoopDog how would this tagging look like in practice, and what would a subscription query look like to specifically match such bundles?

In practice the least disruptive way would be to add bundle level metadata fields for: process_fquid - the process instance that created the bundle protocol_id - the protocol that the process implements - a value form the protocol core schema

Then the user can subscribe to bundles specifying protocol_id along with other refining metadata.

kislyuk commented 5 years ago

@NoopDog I appreciate the intent but I think opaque process IDs would still make the system too complex and indirected to be usable.

kislyuk commented 5 years ago

Please note: I have merged the contents of #86 into this PR; please refer to the discussion in that PR for additional detail and comments addressed from reviewers.

diekhans commented 5 years ago

I create issue #114 to get media types turned into a real RFC. That is good enough to not impact this PR.

kislyuk commented 5 years ago

@diekhans I already imported it here: https://github.com/HumanCellAtlas/dcp-community/pull/113

NoopDog commented 4 years ago

As a refinement and extension of this RFC I have created an alternate RFC linked here : RFC: HCA DCP Application Layer Bundle Types and Definitions .

The alternate proposal differs from this RFC mainly in that:

  1. Type information is added to a new type.json metadata file rather than added to the DSS bundle.json file.

  2. Type information is expressed in JSON rather than in RFC 7231 media type syntax.

  3. The schema and bundle types are represented as JSON schema in the metadata schema repo and documented on the data portal rather than maintained in a DSS registry.

  4. The proposed types are refined by making the process and protocol that created the bundle explicit in the type.json.

Your review/feedback is appreciated.

NoopDog commented 4 years ago

Pausing to merge with #119 and align with upcoming reproducibility and data citation requirements.

NoopDog commented 3 years ago

Obsolete