chanzuckerberg / cellxgene-census

CZ CELLxGENE Discover Census
https://chanzuckerberg.github.io/cellxgene-census/
MIT License
72 stars 19 forks source link

Creating a Census object from inhouse h5ad projects (Discussion) #1132

Open danishzmalik opened 2 months ago

danishzmalik commented 2 months ago

Hi, I have a list of h5ad files from which i want to create a Census object

I want to do this in databricks. I've just started exploring cellxgene-census and hit a roadblock.

We're supposed to migrate from cellbrowser to cellxgene hence the RnD. Would be nice if someone could share some guidance.

pablo-gar commented 2 months ago

Hi @danishzmalik,

Thanks for submitting the issue. Would you mind sharing more about your use case, to see if and how we could support it? Are you hoping to create a Census-like object with your own data? Or are you interested in adopting our data standards?

danishzmalik commented 1 month ago

Thank you for responding @pablo-gar

My intention is to create a census like object from around 400 single-cell h5ad files residing in a s3 bucket mounted on Databricks.

As far as i understand, I need to convert each file to SOMA format, and then somehow append these SOMA objects to form a Census object.

Also, my goal is to achieve this in the databricks environment. I've been looking into the build_soma (cellxgene-census-builder) module, but i'm having trouble calling it within the notebook. The env doesnt seem to recognize cellxgene-census-builder even though I have successfully installed the cellxgene-census package.

pablo-gar commented 1 month ago

Thanks, this is very informative @danishzmalik.

I think it's unlikely that the Census builder will fit your needs, as it is very opinionated around CELLxGENE data and schema. @aaronwolen at TileDB will be able to provide better support for your use case.

cc @johnkerl

johnkerl commented 1 month ago

Related: https://github.com/single-cell-data/TileDB-SOMA/issues/2569

ryan-williams commented 1 month ago

Per https://github.com/single-cell-data/TileDB-SOMA/issues/2569#issuecomment-2124806602, there's now a tutorial about this on the TileDB-SOMA docs: Ingesting multiple datasets to a SOMA Experiment.