chanzuckerberg / single-cell

A collection of documents that reflect various design decisions that have been made for the cellxgene project.
MIT License
4 stars 2 forks source link

CELLxGENE Discover Census data integration directory #587

Closed pablo-gar closed 4 months ago

pablo-gar commented 11 months ago

Background

We will establish 3 “tiers” of integration that dictate how we create visibility, access, and ownership over embeddings and models.

Community Projects (formerly Tier 3) At this support tier, we accept pretty much any embeddings/models that have been trained on the Census data. At this tier, we will add the effort to a webpage that acts as a directory and is accessible through the cellxgene.cziscience.com domain. The directory should contain information about the group that generated the embeddings, associated publication, a link to download the embeddings, information about what Census version it used, and a link to the hosted model (e.g. a link to Hugging Face). At this tier, we do not host either the embeddings or the models and it is up to the contributor to detail how to share links to access both. In terms of marketing support, we may create visibility through social media (e.g. tweets).

CZI Hosted Projects (formerly Tier 2) At this support tier, we accept some community-generated embeddings and models that have been trained on an LTS version of Census data that have passed a minimum standardization requirements. CELLxGENE commits to adding the embedding and models to our directory (Tier 3 support) and additionally hosts the embeddings. These embeddings will be served through a lightweight independent API and will be interoperable with the Census. We do not commit to the long term maintenance or productionalization of these embeddings or models.

CZI Maintained Projects (formerly Tier 1) At this support tier, we (CZI) are making commitments to not only host the embeddings and models, but also commit to retraining the models on a regular cadence, updating the models to reflect the latest Census data (i.e. productionalizing the model). They are aligned to the Census LTS releases. At this tier, we require a deep partnership with the lab developing the model.

User stories

Product deliverables

The epics below should encapsulate our work around the user stories to:

  1. Create web content to host information about each tier and the projects associated to each.
  2. Demonstration notebooks in the site for CZI Maintained Projects & CZI Hosted Projects
  3. Create a hosting system, data requirement and policies for CZI Hosted Projects assets.
  4. Enable human-friendly ingestion of data and metadata for CZI Hosted Projects and Community Projects.