HumanCellAtlas / dcp

Data Coordination Platform manifest and integration tests.
3 stars 1 forks source link

[spike] Enable data files from the same sequencing library to be collected as a logical unit #415

Open kbergin opened 5 years ago

kbergin commented 5 years ago

This ticket represents the work to spike on implementation for this theme. An RFC will be the end result.

For the theme: User Story A consumer can download a uniformly processed matrix file which reflects all the data processing results from one sequencing library such as all sequencing lane replicates for a high coverage 10x experiment being processed together

Demoable Criteria Process a dataset with a multi lane/machine sequencing strategy for a single library and confirm our results are comparable to the outputs produced by the submitting lab.

Success Metric We can detect and correctly process data sets where the libraries are found in multiple sequencing lanes.

justincc commented 5 years ago

Depending on the output for the spike this may have a dependency on #414

justincc commented 5 years ago

@malloryfreeberg, @barkasn already mentioned this to me at the F2F but when you're drafting the RFC, if it looks like there may be significant ingest work could you dial in myself and @aaclan-ebi in as soon as makes sense? I'd like to have the heads up sooner rather than later.

malloryfreeberg commented 5 years ago

@justincc sure thing. I'm working on this in the current sprint, so I'll reach out soon as it's necessary.

brianraymor commented 5 years ago

Is there a design document or RFC that can be linked into this Spike to help track progress and state?

kbergin commented 5 years ago

Nick's RFC here

justincc commented 5 years ago

@malloryfreeberg's library entity RFC here which overlaps. I believe these RFCs will get reconciled once @barkasn is back.

brianraymor commented 5 years ago

Mallory's RFC is in community review:

Nick's RFC is in community review

brianraymor commented 5 years ago

Per the July 18 Refinement meeting, the Milestone needs to be updated to reflect when the RFCs will be reconciled and approved.

brianraymor commented 5 years ago

Updating to Milestone 2. @morrisonnorman and @kbergin - please correct if you disagree.

brianraymor commented 5 years ago

Discussed during the August 15 Refinement meeting - there are multiple problems with this issue:

  1. There are four owners but no single owner who is driving this to completion. @diekhans has volunteered to update and drive this spike until @morrisonnorman returns on August 25. In general, the preference is for Product Owners to own and drive issues.

  2. There have been no regular updates about status of either RFC. It is believed that Nick's RFC has been withdrawn and will be resubmitted for community review. @jkaneria and @barkasn - please comment. I've only seen the original announcement for community review. The RFC indicates TBD Last Call for Community Review Mallory's RFC is completing Oversight review today. I would strongly recommend that this type of information be maintained in the top-level summary comment of this issue, so reviewers do not need to scroll for status.

  3. As a result, this now slips from Milestone 2 to Milestone 3.

diekhans commented 5 years ago

RFC: Processing Datasets that Span Multiple Data Collection Runs #88 https://github.com/HumanCellAtlas/dcp-community/pull/88

has been significantly updated with the Last Call for Community Review of Aug 27th.

diekhans commented 5 years ago

related RFC is tech arch approved: https://github.com/HumanCellAtlas/dcp-community/pull/87

morrisonnorman commented 5 years ago

Spike currently blocked by: *[Processing Datasets that Span Multiple Data Collection Runs RFC] (https://github.com/HumanCellAtlas/dcp-community/pull/88) RFC is in oversight review until September 27