Closed hannes-ucsc closed 4 years ago
A good example: https://www.gutcellatlas.org/tcr-bcr
filename filesize startup
0 pbmc3k.h5ad 22189122 00:00:01.393100
3 output_.seurat.h5ad 45788728 00:00:02.525500
1 tabula-muris.h5ad 1178565335 00:00:01.811200
2 tabula-muris-senis.h5ad 3844684983 00:02:12.281500
cellxgene also has a "backed" mode that accelerate file loading and saves memory but slows down analysis once the server is up. I did not test the slowdown but the speedup was remarkable, especially on the largest file:
filename filesize startup
0 pbmc3k.h5ad 22189122 00:00:00.935800
3 output_.seurat.h5ad 45788728 00:00:00.939000
1 tabula-muris.h5ad 1178565335 00:00:01.930800
2 tabula-muris-senis.h5ad 3844684983 00:00:02.810500
Complete dependency tree:
name summary
----------------------------------------------- -----------------------------------------------------------------------
cellxgene Web application for exploration of large scale scRNA-seq datasets
├── Flask-Caching>=1.4.0 Adds caching support to your Flask application
│ └── Flask A simple framework for building complex web applications.
│ ├── Jinja2>=2.10.1 A very fast and expressive template engine.
│ │ └── MarkupSafe>=0.23 Safely add untrusted strings to HTML/XML markup.
│ ├── Werkzeug>=0.15 The comprehensive WSGI web application library.
│ ├── click>=5.1 Composable command line interface toolkit
│ └── itsdangerous>=0.24 Various helpers to pass data to untrusted environments and back.
├── Flask-Compress>=1.4.0 Compress responses in your Flask app with gzip.
│ └── Flask A simple framework for building complex web applications.
│ ├── Jinja2>=2.10.1 A very fast and expressive template engine.
│ │ └── MarkupSafe>=0.23 Safely add untrusted strings to HTML/XML markup.
│ ├── Werkzeug>=0.15 The comprehensive WSGI web application library.
│ ├── click>=5.1 Composable command line interface toolkit
│ └── itsdangerous>=0.24 Various helpers to pass data to untrusted environments and back.
├── Flask-Cors>=3.0.6 A Flask extension adding a decorator for CORS support
│ ├── Flask>=0.9 A simple framework for building complex web applications.
│ │ ├── Jinja2>=2.10.1 A very fast and expressive template engine.
│ │ │ └── MarkupSafe>=0.23 Safely add untrusted strings to HTML/XML markup.
│ │ ├── Werkzeug>=0.15 The comprehensive WSGI web application library.
│ │ ├── click>=5.1 Composable command line interface toolkit
│ │ └── itsdangerous>=0.24 Various helpers to pass data to untrusted environments and back.
│ └── Six Python 2 and 3 compatibility utilities
├── Flask-RESTful>=0.3.6 Simple framework for creating REST APIs
│ ├── Flask>=0.8 A simple framework for building complex web applications.
│ │ ├── Jinja2>=2.10.1 A very fast and expressive template engine.
│ │ │ └── MarkupSafe>=0.23 Safely add untrusted strings to HTML/XML markup.
│ │ ├── Werkzeug>=0.15 The comprehensive WSGI web application library.
│ │ ├── click>=5.1 Composable command line interface toolkit
│ │ └── itsdangerous>=0.24 Various helpers to pass data to untrusted environments and back.
│ ├── aniso8601>=0.82 A library for parsing ISO 8601 strings.
│ ├── pytz World timezone definitions, modern and historical
│ └── six>=1.3.0 Python 2 and 3 compatibility utilities
├── Flask>=1.0.2 A simple framework for building complex web applications.
│ ├── Jinja2>=2.10.1 A very fast and expressive template engine.
│ │ └── MarkupSafe>=0.23 Safely add untrusted strings to HTML/XML markup.
│ ├── Werkzeug>=0.15 The comprehensive WSGI web application library.
│ ├── click>=5.1 Composable command line interface toolkit
│ └── itsdangerous>=0.24 Various helpers to pass data to untrusted environments and back.
├── anndata==0.6.22post1 Annotated Data.
│ ├── h5py Read and write HDF5 files from Python
│ │ ├── numpy>=1.7 NumPy is the fundamental package for array computing with Python.
│ │ └── six Python 2 and 3 compatibility utilities
│ ├── natsort Simple yet flexible natural sorting in Python.
│ ├── numpy~=1.14 NumPy is the fundamental package for array computing with Python.
│ ├── pandas>=0.23.0 Powerful data structures for data analysis, time series, and statistics
│ │ ├── numpy>=1.13.3 NumPy is the fundamental package for array computing with Python.
│ │ ├── python-dateutil>=2.6.1 Extensions to the standard Python datetime module
│ │ │ └── six>=1.5 Python 2 and 3 compatibility utilities
│ │ └── pytz>=2017.2 World timezone definitions, modern and historical
│ └── scipy~=1.0 SciPy: Scientific Library for Python
│ └── numpy>=1.13.3 NumPy is the fundamental package for array computing with Python.
├── click>=6.7 Composable command line interface toolkit
├── fastobo>=0.6.1 Faultless AST for Open Biomedical Ontologies in Python.
├── flatbuffers>=1.10.0 The FlatBuffers serialization format for Python
├── fsspec>=0.4.4 File-system specification
├── h5py==2.9.0 Read and write HDF5 files from Python
│ ├── numpy>=1.7 NumPy is the fundamental package for array computing with Python.
│ └── six Python 2 and 3 compatibility utilities
├── numpy>=1.15.2 NumPy is the fundamental package for array computing with Python.
├── pandas>=0.24.2 Powerful data structures for data analysis, time series, and statistics
│ ├── numpy>=1.13.3 NumPy is the fundamental package for array computing with Python.
│ ├── python-dateutil>=2.6.1 Extensions to the standard Python datetime module
│ │ └── six>=1.5 Python 2 and 3 compatibility utilities
│ └── pytz>=2017.2 World timezone definitions, modern and historical
├── requests>=2.22.0 Python HTTP for Humans.
│ ├── certifi>=2017.4.17 Python package for providing Mozilla's CA Bundle.
│ ├── chardet<4,>=3.0.2 Universal encoding detector for Python 2 and 3
│ ├── idna<3,>=2.5 Internationalized Domain Names in Applications (IDNA)
│ └── urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 HTTP library with thread-safe connection pooling, file post, and more.
├── scipy>=1.3.0 SciPy: Scientific Library for Python
│ └── numpy>=1.13.3 NumPy is the fundamental package for array computing with Python.
└── tables==3.5.1 Hierarchical datasets for Python
├── mock>=2.0 Rolling backport of unittest.mock for all Pythons
├── numexpr>=2.6.2 Fast numerical expression evaluator for NumPy
│ └── numpy>=1.7 NumPy is the fundamental package for array computing with Python.
├── numpy>=1.9.3 NumPy is the fundamental package for array computing with Python.
└── six>=1.9.0 Python 2 and 3 compatibility utilities
I was able to launch cellxgene in two parallel bash sessions in the same directory on the same data file. They instantiated at different ports on localhost (127.0.0.1:5005 and 127.0.0.1.5006), and could be manipulated and terminated independently.
New questions:
From standup: measure memory usage
non-backed: peak usage while loading data: 13.8G peak usage during differential expression: 13.5G resting usage: 2917M backed: peak usage during startup: 538M peak usage during differential expression: 13.2G resting usage: 203M
We might be asked to host CZI's cellxgene for the HCA, on the HCA AWS accounts.
The following questions should be answered by installing and running cellxgene locally:
1) How do we get an input file to run it on?
2) How long does it take to initialize the server process, potentially as a function of the input size?
3) What dependencies does it have?
4) Does one instance support multiple input dataset?
5) Can one run multiple instances on one input dataset? The documentation seems to indicate that one cannot.