chanzuckerberg / single-cell-data-portal

The data portal supporting the submission, exploration, and management of projects and datasets to cellxgene.
MIT License
63 stars 12 forks source link

Flesh out the implementation behind the facade for writes such that the wmg pipeline will write an s3 path that contains a data schema version #5166

Closed prathapsridharan closed 1 year ago

prathapsridharan commented 1 year ago

Change the implementation behind the facade such that:

prathapsridharan commented 1 year ago

One thing to keep in mind is that the construction of the marker genes cube uses load_snapshot().

This is important because the WMG API (the reader) uses load_snapshot() to read data. Currently, load_snapshot interface has been changed in anticipation of the new location from which snapshot should be read but it currently does not read from the new location (because data currently is not being produced in the new location). Similar to how a facade was introduced for the write path, the load_snapshot interface was modified - see this ticket

So what this all means, is that completing this ticket also means fleshing out load_snapshot but we need to introduce a flag or default parameter, say read_from_new_loc, such that if read_from_new_loc is true then data is read from new location, otherwise data is read from old location. So when constructing the marker genes cube, we would need data to be read from the new location.

After the writer has been deployed and verified that it indeed does write to the new location, the API can read from the new location as well - that is, the read_from_new_loc flag and logic gated by it can be entirely removed.