cistrome / MIRA

Python package for analysis of multiomic single cell RNA-seq and ATAC-seq.
52 stars 7 forks source link

Question for ATAC topic model #29

Closed MoLuLuMo closed 10 months ago

MoLuLuMo commented 10 months ago
np.random.seed(0)
atac_data.var['endogenous_peaks'] = np.random.rand(atac_data.shape[1]) <= min(1e5/atac_data.shape[1], 1)
###

atac_model = mira.topics.make_model(
    atac_data.n_obs,atac_data.n_vars,  
    feature_type = 'accessibility',
    endogenous_key='endogenous_peaks' # which peaks are used by the encoder network
)

atac_model.get_learning_rate_bounds(atac_data) # pass cache file

AssertionError Traceback (most recent call last) Cell In[12], line 1 ----> 1 atac_model.get_learning_rate_bounds(atac_data) # pass cache file

File ~/.conda/envs/mira/lib/python3.9/site-packages/mira/adata_interface/core.py:179, in wraps_modelfunc..run.._run(self, adata, *args, kwargs) 174 if not any( 175 [kwarg in subfunction_kwargs.keys() for subfunction_kwargs in [getter_kwargs, adder_kwargs, function_kwargs]] 176 ): 177 raise TypeError('{} is not a valid keyword arg for this function.'.format(kwarg)) --> 179 output = func(self, fetch(self, adata, getter_kwargs), function_kwargs) 181 return add(adata, output, **adder_kwargs)

File ~/.conda/envs/mira/lib/python3.9/site-packages/mira/topic_model/base.py:894, in BaseModel.get_learning_rate_bounds(self, num_steps, eval_every, num_epochs, lower_bound_lr, upper_bound_lr, features, highly_variable, dataset) 849 @adi.wraps_modelfunc(fetch = tmi.fit, 850 fill_kwargs=['features','highly_variable','dataset'], requires_adata = False) 851 def get_learning_rate_bounds(self, num_steps = 100, eval_every = 3, num_epochs = 3, 852 lower_bound_lr = 1e-6, upper_bound_lr = 5,*, 853 features, highly_variable, dataset): 854 ''' 855 Use the learning rate range test (LRRT) to determine minimum and maximum learning 856 rates that enable the model to traverse the gradient of the loss. (...) 892 893 ''' --> 894 self._instantiate_model( 895 features = features, 896 highly_variable = highly_variable, 897 dataset = dataset 898 ) 900 data_loader = dataset.get_dataloader(self, 901 training=True, batch_size=self.batch_size) 903 #n_batches = len(data_loader)

File ~/.conda/envs/mira/lib/python3.9/site-packages/mira/topic_model/base.py:828, in BaseModel._instantiate_model(self, training_bar, features, highly_variable, dataset) 825 self.num_extra_features = batch['extra_features'].shape[-1] 826 self.covariate_compensation = self.num_covariates > 0 --> 828 self._get_weights( 829 on_gpu=True, inference_mode=False, 830 num_covariates = self.num_covariates, 831 num_exog_features = self.num_exog_features, 832 num_endog_features = self.num_endog_features, 833 num_extra_features = self.num_extra_features, 834 )

File ~/.conda/envs/mira/lib/python3.9/site-packages/mira/topic_model/base.py:706, in BaseModel._get_weights(self, on_gpu, inference_mode, num_exog_features, num_endog_features, num_covariates, num_extra_features) 692 decoder_kwargs.update({ 693 'covariates_hidden' : self.covariates_hidden, 694 'covariates_dropout' : self.covariates_dropout, 695 'mask_dropout' : self.mask_dropout, 696 }) 698 self.decoder = self._decoder_model( 699 num_exog_features = num_exog_features, 700 num_topics = self.num_topics, (...) 703 **decoder_kwargs, 704 ) --> 706 self.encoder = self.encoder_model( 707 embedding_size = self.embedding_size, 708 num_endog_features = num_endog_features, 709 num_exog_features = num_exog_features, 710 num_topics = self.num_topics, 711 num_covariates = num_covariates, 712 num_extra_features = num_extra_features, 713 embedding_dropout = self.embedding_dropout, 714 hidden = self.hidden, 715 dropout = self.encoder_dropout, 716 num_layers = self.num_layers 717 ) 719 self.K = torch.tensor(self.num_topics, requires_grad = False) 720 self.to(self.device)

File ~/.conda/envs/mira/lib/python3.9/site-packages/mira/topic_model/modality_mixins/accessibility_encoders.py:81, in DANSkipEncoder.init(self, embedding_size, num_endog_features, num_topics, embedding_dropout, hidden, dropout, num_layers, num_exog_features, num_covariates, num_extra_features) 77 def init(self, embedding_size = None,*, num_endog_features, num_topics, embedding_dropout, 78 hidden, dropout, num_layers, num_exog_features, num_covariates, num_extra_features): 79 super().init() ---> 81 assert num_layers > 2, 'Cannot use SkipEncoder with less than three layers.' 83 if embedding_size is None: 84 embedding_size = hidden

AssertionError: Cannot use SkipEncoder with less than three layers.

AllenWLynch commented 10 months ago

Hi, thank you for reporting this issue.

MIRA's ATAC model has multiple options for input encoders, the most performant of which is the "SkipEncoder". However, MIRA also chooses hyperparameters based on the size of your input dataset. The SkipEncoder must have three layers total (so there must be a hidden layer), but the num_layers = 3 default is being overwritten.

Is your input dataset small? Like, less than 2400 samples?

MoLuLuMo commented 10 months ago

I have 1000 cells only. Would it still be possible to use MIRA topic model?

AllenWLynch commented 10 months ago

Yes, I’ve tested performance on that few of cells before.

I would recommend changing the this parameter in the “make_model” function:

atac_encoder = “DAN”,

Lmk if that works for you.

On Aug 29, 2023, at 10:33 AM, Minxue Jia @.***> wrote:



I have 1000 cells only. Would it still be possible to use MIRA topic model?

— Reply to this email directly, view it on GitHubhttps://github.com/cistrome/MIRA/issues/29#issuecomment-1697567082, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AE43JPBUVQR4DBJJKHMLMLLXXX4TPANCNFSM6AAAAAA4CBWDAQ. You are receiving this because you commented.Message ID: @.***>

MoLuLuMo commented 10 months ago

It works. Thank you!