cistrome / MIRA

Python package for analysis of multiomic single cell RNA-seq and ATAC-seq.
52 stars 7 forks source link

Error running get_learning_rate_bounds() for an Accessibility Topic Model #21

Closed jmmuncie closed 1 year ago

jmmuncie commented 1 year ago

Hi,

Thanks for creating and maintaining such an interesting tool for analyzing multiome data. I am having some trouble running get_learning_rate_bounds() on my accessibility topic model.

First, I instantiated the model with default parameters by running:

atac_model = mira.topics.AccessibilityTopicModel(
    seed = 0
)

Next, I ran get_learning_rate_bounds() with default parameters by running:

atac_model.get_learning_rate_bounds(data_Peaks)

And the following error was returned:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/scratch/jmuncie/ipykernel_24324/1471803083.py in <module>
      2 #get_learning_rate_bounds runs an array of learning rates to find the values for which the model is most responsive
      3 #Running with default parameters
----> 4 atac_model.get_learning_rate_bounds(data_Peaks)

~/.conda/envs/mira-env/lib/python3.7/site-packages/mira/adata_interface/core.py in _run(self, adata, *args, **kwargs)
    175                     raise TypeError('{} is not a valid keyword arg for this function.'.format(kwarg))
    176 
--> 177             output = func(self, **fetch(self, adata, **getter_kwargs), **function_kwargs)
    178 
    179             return add(adata, output, **adder_kwargs)

~/.conda/envs/mira-env/lib/python3.7/site-packages/mira/topic_model/base.py in get_learning_rate_bounds(self, num_epochs, eval_every, lower_bound_lr, upper_bound_lr, features, highly_variable, dataset)
    648 
    649                 self.train()
--> 650                 for batch in self.transform_batch(data_loader, bar = False):
    651 
    652                     step_loss += self._step(batch, 1.)['loss']

~/.conda/envs/mira-env/lib/python3.7/site-packages/mira/topic_model/base.py in transform_batch(self, data_loader, bar, desc)
    486     def transform_batch(self, data_loader, bar = True, desc = ''):
    487 
--> 488         for batch in tqdm(data_loader, desc = desc) if bar else data_loader:
    489             yield {k : torch.tensor(v, requires_grad = False).to(self.device)
    490                 for k, v in batch.items()}

~/.conda/envs/mira-env/lib/python3.7/site-packages/torch/utils/data/dataloader.py in __next__(self)
    679                 # TODO(https://github.com/pytorch/pytorch/issues/76750)
    680                 self._reset()  # type: ignore[call-arg]
--> 681             data = self._next_data()
    682             self._num_yielded += 1
    683             if self._dataset_kind == _DatasetKind.Iterable and \

~/.conda/envs/mira-env/lib/python3.7/site-packages/torch/utils/data/dataloader.py in _next_data(self)
    719     def _next_data(self):
    720         index = self._next_index()  # may raise StopIteration
--> 721         data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
    722         if self._pin_memory:
    723             data = _utils.pin_memory.pin_memory(data, self._pin_memory_device)

~/.conda/envs/mira-env/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py in fetch(self, possibly_batched_index)
     50         else:
     51             data = self.dataset[possibly_batched_index]
---> 52         return self.collate_fn(data)

~/.conda/envs/mira-env/lib/python3.7/site-packages/mira/adata_interface/topic_model.py in collate_batch(batch, preprocess_endog, preprocess_exog, preprocess_read_depth)
     16 
     17     return {
---> 18         'endog_features' : preprocess_endog(endog),
     19         'exog_features' : preprocess_exog(exog),
     20         'read_depth' : preprocess_read_depth(exog)

~/.conda/envs/mira-env/lib/python3.7/site-packages/mira/topic_model/accessibility_model.py in preprocess_endog(X)
    168 
    169             return self._get_padded_idx_matrix(
--> 170                     self._binarize_matrix(X, self.num_endog_features)).astype(np.int32)
    171 
    172         return preprocess_endog

~/.conda/envs/mira-env/lib/python3.7/site-packages/mira/topic_model/accessibility_model.py in _get_padded_idx_matrix(self, accessibility_matrix)
    152         dense_matrix = []
    153         for i in range(accessibility_matrix.shape[0]):
--> 154             row = accessibility_matrix[i,:].indices + 1
    155             if len(row) == width:
    156                 dense_matrix.append(np.array(row)[np.newaxis, :])

TypeError: 'coo_matrix' object is not subscriptable

It appeared to me that this might be an error with the formatting of my data, but as far as I can tell the structure of my data is the same as that provided in the tutorial/example dataset:

Screen Shot 2023-04-20 at 10 36 37 AM

Any help or suggestions for troubleshooting this would be greatly appreciated! Thank you!

jmmuncie commented 1 year ago

Was able to solve! The matrix in my AnnData object was stored in compressed sparse column (CSC) instead of compressed sparse row (CSR) format. This can be fixed easily as follows:

import scipy.sparse as sp
data.X = data.X.tocsr()

OR, step by step

import scipy.sparse as sp

# extract your CSC-formatted sparse matrix
sparse = data.X

# convert CSC to CSR format
sparse_csr = sparse.tocsr()

# put CSR formatted matrix back into adata object
data.X = sparse_csr