Novartis / cellxgene-gateway

Cellxgene Gateway allows you to use the Cellxgene Server provided by the Chan Zuckerberg Institute (https://github.com/chanzuckerberg/cellxgene) with multiple datasets.
Apache License 2.0
52 stars 32 forks source link

Optimizing Large Dataset Loading and Differential Expression Analysis in local hosted CellxGene VM #85

Open chunhuicai opened 1 year ago

chunhuicai commented 1 year ago

We are currently utilizing CellxGene VM (https://github.com/Novartis/cellxgene-gateway) to host a substantial spatial transcriptomic dataset comprising roughly 16 million cells. However, we are facing a couple of critical issues that are hampering our analysis workflow:

Dataset Loading:

Incomplete Loading: During the dataset loading process, we sometimes experience disruptions and incomplete loading scenarios. Though after several attempts, we can achieve full dataset loading with a loading time around 3m30s, the inconsistency remains a concern. Conversion to CXG: After successful conversion of our dataset to CXG format, we realized that it is not being recognized by our self-hosted explorer.

Differential Expression Analysis:

Inconsistent Loading of Gene Details: While attempting to utilize the differential expressed gene function, we noticed it doesn't uniformly complete the loading of all gene details. Comparatively, using CZI to work with large datasets (over 4 million cells), we observed a fast data loading and a smooth completion of differential expression analysis in a few seconds. Is there any practices, setups, or approaches that would help us to efficiently handle and analyze big datasets on the local CellxGene VM to achieve performance similar to CZI?