HumanCellAtlas / dcp

Data Coordination Platform manifest and integration tests.
3 stars 1 forks source link

Users can distinguish between tiers of pipeline outputs #270

Open theathorn opened 5 years ago

theathorn commented 5 years ago

From https://docs.google.com/document/d/14gVAMscd0d8KgWqOO11Y1kSp4ztVavZXA7zrtrhyiW0:

Ensure community pipelines are labeled as such in metadata and this metadatum is surfaced in browser.

From duplicated Method to delineate community-pipeline analyzed data vs HCA-approved pipeline analyzed data:

There is a desire to have as many at-least-community-approved "v1" pipelines running in the DCP to enable as much science as possible. We will have continued efforts to create HCA AWG pipeline subgroup approved pipelines that are of what we determine to be the highest quality. We will require a way to delineate these two types of pipelines and output data in the HCA (via metadata) and via the access portal, documentation, and maybe even the CLI so that it is very clear to users.

theathorn commented 5 years ago

@kbergin Can you assign a shepherd from Green team?

brianraymor commented 5 years ago

I think that the title for this issue is misleading. It suggests that the issue is about supporting community pipelines but the actual description is related to flagging such pipelines in metadata.

brianraymor commented 5 years ago

@kbergin - is this a duplicate of Method to delineate community-pipeline analyzed data vs HCA-approved pipeline analyzed data?

kbergin commented 5 years ago

Yes it is @brianraymor . We can keep this one if that's favorable. I'll close the other one. I'm not sure this is the exact implementation/naming we'll end up with, but I am thinking about this proposal currently.

kbergin commented 5 years ago

@lauraclarke high level epic about distinguishing between pipeline types. Currently secondary-analysis/406 represents adding it to the metadata, but can be moved to metadata-schema backlog as well! Whatever you prefer. This need will arise in early Q3.

kbergin commented 5 years ago

@jkaneria I've tagged you here, the green box related change is an analysis metadata schema update to support distinguishing pipeline output types. The browser and portal are also linked here as they will need to expose that metadata.

brianraymor commented 5 years ago

@jkaneria - Based on the comment in the child issue:

We have not yet prioritized it because we haven't yet had a need to install any contributed or non-standard HCA pipelines into the DCP.

This does not sound like a priority for Q3 (and is not one of the Q3 Roadmap objectives). If this is the case, would you remove the Release/Milestone and move to the Icebox pipeline for future consideration? We really want the Product Backlog limited to work scheduled for the current Release.

brianraymor commented 5 years ago

@jkaneria - I'm clearing the Release and Milestone and moving this to the Icebox since there's been no update since July 2019. This is also not prioritized for Q4.

kbergin commented 5 years ago

This is actively being designed.

brianraymor commented 5 years ago

@kbergin - I don't see any status updates in the issue since July - are you referencing the RFC in community review or specific github issues? Have the child issues been negotiated for Q4M3 too - since this is not a Q4 priority as defined by the Project Leads?