Open mlin opened 2 months ago
S3 folder with the final-for-now indexes currently in a private bucket (details on Slack).
The folder structure there starts with a 2023-12-15 subfolder (census version).
The goal is to copy it into a suitable public location under s3://cellxgene-contrib-public
, similar to CENSUS_EMBEDDINGS_LOCATION_BASE_URI we use to resolve the embedding arrays themselves, specifically by appending the census version and embedding ID.
Again the staging folder has the desired structure, which we just need to preserve in copying it to the public bucket.
@ebezzi @metakuni To close out this ticket, is there somewhere we're documenting the Census/LTS release process where we could include information about the indexes?
I believe @ebezzi had put together a doc.
The Census cell similarity search is backed by TileDB-Vector-Search indexes of the embeddings. These indexes are themselves TileDB arrays to store on S3. Finalize details of where they should be stored on S3 and the procedures we'll use to build and publish them there for each Census LTS release. And the experimental Python APIs should use the finalized locations of course.