HumanCellAtlas / secondary-analysis

Secondary Analysis Service of the Human Cell Atlas Data Coordination Platform
https://pipelines.data.humancellatlas.org/ui/
BSD 3-Clause "New" or "Revised" License
3 stars 2 forks source link

Integrate EmptyDrops output in to the Optimus matrix result #779

Closed kbergin closed 4 years ago

kbergin commented 5 years ago

Why Our users currently have two ways of interacting with 10x data in the HCA DCP. One way is to download an entire analysis bundle and retrieve all of the outputs from a particular analysis, which would correlate to a cell suspension. In this method, the previous ticket which incorporated EmptyDrops and includes the entire csv output would be useful for this user. Another way a user can download our data is through the matrix service. The matrix service relies on zarr outputs from our pipeline to filter and concatenate outputs from different analyses to deliver a single matrix to the user based on their custom requests. Only data that appears in the zarr outputs can currently be included in this final delivered matrix. Therefore, any data we want the user to receive from EmptyDrops needs to be included in our zarr outputs.

Integrate the metadata produced by the EmptyDrops step in output into the zarr output of the pipeline.

Suggestion: Integrate the metadata produced by the emptyDrops step in the python cell metadata output and convert that into zarr along with the other metadata (this solution ensures flexibility in supporting future file formats and interfaces).

ACs

┆Issue is synchronized with this Jira User Story

kbergin commented 5 years ago

➤ Nick Barkas commented:

This is done in terms of coding changes. I require internal review to be able to merge. I also need the contact details for Liz.

I have coordinated with Marcus and he is happy with the configuration. He had some suggestions and I will add them as backlog tickets.

Marcus will be away on holidays until mid-September. In order not to block other tickets we will merge this and continue with other related tickets. This way we can sync with all the changes with the Matrix service in one go.

kbergin commented 5 years ago

Agh bummer Liz hasn’t actually started yet I believe she starts next week. We may have to update our own or ask for help from user education. I’ll send an email and cc you

kbergin commented 5 years ago

➤ Nick Barkas commented:

Marcus to QA. Please note that documentation is updated, but HCA public documentation is not (and shouldn’t be until this goes to production)

kbergin commented 5 years ago

@mckinsel - when you return from vacation, could you help Nick out with QA'ing this change? It should have no breaking impact on matrix service as we did not remove the old emptydrops output. (see linked issue in matrix-service backlog)

cc @brianraymor

kbergin commented 4 years ago

➤ Nick Barkas commented:

Feedback from Marcus: looks good but there is some inconsistency between the code and the schema next to pipeline → update schema doc and its fine

kbergin commented 4 years ago

➤ Nick Barkas commented:

Addressed comments with new PR: https://github.com/HumanCellAtlas/skylab/pull/263 ( https://github.com/HumanCellAtlas/skylab/pull/263|smart-link )

Awaiting review to close