We recommend creating venv
or conda
environment with python==3.12
.
conda create -n multimodal_hackaton python=3.12
source activate multimodal_hackaton
And then:
pip3 install -r requirements.txt
Papers: Multimodal single cell data integration challenge: Results and lessons learned https://proceedings.mlr.press/v176/lance22a/lance22a.pdf Documentation: https://openproblems.bio/events/2021-09_neurips/documentation/data/about_multimodal When you want to run any experiment, run:
cd src
and then
python3 train_and_validate.py [ARGUMENTS]
with possible options:
-h, --help
: show this help message and exit--method
{"omivae", "babel", "advae"} --config
(default="standard"): Name of a configuration in src/config/{method} directory.--cv-seed
(default=42): Seed used to make k folds for cross validation.--n-folds
(default=5): Number of folds in cross validation.--subsample-frac
(default=None): Fraction of the data to use for cross validation.--retrain
(default=True): Retrain a model using the whole dataset.We use dataset from NeurIPS competition: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE194122 named GSE194122_openproblems_neurips2021_cite_BMMC_processed.h5ad.gz.
We assume data is in path_to_repo\repo_name\data
.
Predicting one modality from another: As metrics, we consider root mean squared error (RMSE) and Pearson correlation on log-scaled counts, as well as Spearman correlation.
Matching cells between modalities: As metrics, we consider area under the precision recall curve (AUPR) and the average probability assigned to the correct matching. The latter is a relative measure per dataset that accounts for non-identifiability among cells with the same identity.
To see the logs from different runs from the src
directory run
tensorboard --logdir=logs