chanzuckerberg / single-cell

A collection of documents that reflect various design decisions that have been made for the cellxgene project.
MIT License
4 stars 2 forks source link

SCVI integrated embeddings and pre-trained models are incorporated into the Census #468

Closed pablo-gar closed 9 months ago

pablo-gar commented 1 year ago

Goal

To enable users to readily start atlas-level analysis of Census for any or all cells of an organism.

User stories

Census currently allows access to single-cell data from hundreds of different datasets, cells from one dataset are in a different numerical space as compared to cells from any other dataset.

Therefore, while users can access all of these data, they cannot readily start their analysis to answer scientific questions about cell biology.

Integration aligns the numerical space of all cells, enabling a multitude of user stories. Below are just a selection of some of the most relevant stories that we will fulfill with this project.

Approach

We are to accomplish the goal by providing scVI-based integrated embeddings in the Census SOMA data along with the trained model.

As detailed in this document, and for the first iteration of project, at a high-level we need to create a workflow that can be manually triggered behind the following tasks:

KRs

Assumptions and Risks

Assumptions

Risks

Plan

Important notes about the plan

View high-resolution schematic at FigJam here

Private Zenhub Image

To create a process to train an scVI model across all unique human and mouse cells.

To create a process to generate and save embeddings from a trained model across all human and mouse cells

To create a process to save and expose the trained/fine-tuned model for API access

[STRETCH] To create a process to find best hyper-parameters for an scVI model across all unique human and mouse cells.

[STRETCH] To create a process to fine-tune an scVI model across new unique human and mouse cells not previously seen by the model.

[STRETCH] To Assess/prototype usage of Census PyTorch loader for use in the training pipeline

pablo-gar commented 9 months ago

completed