Closed prathapsridharan closed 1 year ago
One thing to keep in mind is that the construction of the marker genes cube uses load_snapshot().
This is important because the WMG API (the reader) uses load_snapshot()
to read data. Currently, load_snapshot
interface has been changed in anticipation of the new location from which snapshot should be read but it currently does not read from the new location (because data currently is not being produced in the new location). Similar to how a facade was introduced for the write path, the load_snapshot
interface was modified - see this ticket
So what this all means, is that completing this ticket also means fleshing out load_snapshot
but we need to introduce a flag or default parameter, say read_from_new_loc
, such that if read_from_new_loc
is true
then data is read from new location, otherwise data is read from old location. So when constructing the marker genes cube, we would need data to be read from the new location.
After the writer has been deployed and verified that it indeed does write to the new location, the API can read from the new location as well - that is, the read_from_new_loc
flag and logic gated by it can be entirely removed.
Change the implementation behind the facade such that:
data_schema_version
in the schema package init files3://cellxgene-wmg-prod/snapshots/<data_schema_version>/<snapshot_creation_timestamp>/
folders3://cellxgene-wmg-prod/snapshots/<data_schema_version>/latest_snapshot_identifier
file with value<snapshot_creation_timestamp>
s3://cellxgene-wmg-prod/latest_snapshot_run
- a file that is different fromlatest_snapshot_identifier
in that it contains the path to the folder containing the latest data generated (whether the generated data passed validation or not) and can be different fromlatest_snapshot_identifier
(ex: when data validation fails)wmg pipeline
is done inside the<data_schema_version>
folder ins3