cistrome / MIRA

Python package for analysis of multiomic single cell RNA-seq and ATAC-seq.
60 stars 8 forks source link

BayesianTuner Assertion Error #35

Closed dfdeascanis closed 11 months ago

dfdeascanis commented 11 months ago

Hi Team,

Thanks for developing such a useful tool!

I am currently working with a 10X GEX dataset comprising ~40K cells. The 40K cells are derived from 4 donors with substantial batch effects (confirmed via scvi-tools data integration). We used the previous version of Mira (v1.0.4) in our previous studies for Topic analysis, but given our current datasets containing batch effects, we are attempting to leverage the newest version of Mira + CODAL for integration prior to Topic modeling to remove the technical covariation and preserve the biological variation.

I am also doing analysis on an AWS instance (without GPU support unfortunately).

I firs installed a fresh conda environment (with python 3.9.18), followed your comprehensive tutorial for integration via CODAL in your tutorial, and first attempted to use the gradient-based method (given its a larger dataset), but experienced gradient overflows.

I tried further pre-proprocessing (removing high UMI counts above 3 MAD), changing the seed, and reducing the max learning rate to no avail.

Secondly, I attempted to use the Bayesian Optimization workflow to reduce the likelihood of running into gradient overflow errors. However, when attempting to do this, I ran into the follow Assertion Error:

{ "name": "AssertionError", "message": "", "stack": "--------------------------------------------------------------------------- AssertionError Traceback (most recent call last) /mnt/data/projects/amitabh_sc/mira.ipynb Cell 33 line 1 ----> 1 tuner = mira.topics.BayesianTuner( 2 model = model, 3 n_jobs=1, 4 save_name = 'mira_tuning/test', 5 min_topics = 3, max_topics = 20, 6 )

File /mnt/data/miniconda3/envs/mira-env/lib/python3.9/site-packages/mira/topic_model/hyperparameter_optim/trainer.py:493, in BayesianTuner.init(self, model, save_name, min_topics, max_topics, storage, n_jobs, max_trials, min_trials, stop_condition, seed, tensorboard_logdir, model_dir, pruner, sampler, log_steps, log_every, train_size) 491 self.model_dir = model_dir 492 self.train_size = train_size --> 493 self.study = self.create_study()

File /mnt/data/miniconda3/envs/mira-env/lib/python3.9/site-packages/mira/topic_model/hyperparameter_optim/trainer.py:500, in BayesianTuner.create_study(self) 498 def create_study(self): --> 500 return optuna.create_study( 501 directions = self.objective, 502 pruner = self.get_pruner(), 503 study_name = self.study_name, 504 storage = self.storage, 505 load_if_exists= True, 506 )

File /mnt/data/miniconda3/envs/mira-env/lib/python3.9/site-packages/optuna/study/study.py:1136, in create_study(storage, sampler, pruner, study_name, direction, load_if_exists, directions) 1127 raise ValueError( 1128 \"Please set either 'minimize' or 'maximize' to direction. You can also set the \" 1129 \"corresponding StudyDirection member.\" 1130 ) 1132 direction_objects = [ 1133 d if isinstance(d, StudyDirection) else StudyDirection[d.upper()] for d in directions 1134 ] -> 1136 storage = storages.get_storage(storage) 1137 try: 1138 study_id = storage.create_new_study(study_name)

File /mnt/data/miniconda3/envs/mira-env/lib/python3.9/site-packages/optuna/storages/init.py:31, in get_storage(storage) 29 return RedisStorage(storage) 30 else: ---> 31 return _CachedStorage(RDBStorage(storage)) 32 elif isinstance(storage, RDBStorage): 33 return _CachedStorage(storage)

File /mnt/data/miniconda3/envs/mira-env/lib/python3.9/site-packages/optuna/storages/_rdb/storage.py:187, in RDBStorage.init(self, url, engine_kwargs, skip_compatibility_check, heartbeat_interval, grace_period, failed_trial_callback) 185 self._version_manager = _VersionManager(self.url, self.engine, self.scoped_session) 186 if not skip_compatibility_check: --> 187 self._version_manager.check_table_schema_compatibility()

File /mnt/data/miniconda3/envs/mira-env/lib/python3.9/site-packages/optuna/storages/_rdb/storage.py:1310, in _VersionManager.check_table_schema_compatibility(self) 1306 version_info = models.VersionInfoModel.find(session) 1308 assert version_info is not None -> 1310 current_version = self.get_current_version() 1311 head_version = self.get_head_version() 1312 if current_version == head_version:

File /mnt/data/miniconda3/envs/mira-env/lib/python3.9/site-packages/optuna/storages/_rdb/storage.py:1337, in _VersionManager.get_current_version(self) 1335 context = alembic.migration.MigrationContext.configure(self.engine.connect()) 1336 version = context.get_current_revision() -> 1337 assert version is not None 1339 return version

AssertionError: " }

Based on the output, it seems to be some issue with optuna.

Any advice on reducing the gradient overflows in the gradient-based workflow or advice on the above assertion error would be greatly appreciated!

christophechu commented 11 months ago

https://github.com/cistrome/MIRA/issues/30 You might want to try the solution to this problem, I've solved the problem by downgrading the version

AllenWLynch commented 11 months ago

Yes this is an optuna version problem. I am currently testing an update with tweaked dependency versioning requirements that should resolve this issue.