czbiohub-sf / tabula-muris

Code and annotations for the Tabula Muris single-cell transcriptomic dataset.
https://www.nature.com/articles/s41586-018-0590-4
BSD 3-Clause "New" or "Revised" License
187 stars 91 forks source link

Error: division by zero #247

Closed pocession closed 2 years ago

pocession commented 2 years ago

Dear Tabula muris team:

Thank you for creating this beautiful data set. I encounter some problems when using the Large intestine droplet data set.

My code:

# Packages
import scanpy as sc
import anndata
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from pathlib import Path
%matplotlib inline

# Dir
dir = os.getcwd()
subset = 'subset'
droplet = 'tabula-muris-senis-droplet-processed-official-annotations.h5ad'
facs = 'tabula-muris-senis-facs-processed-official-annotations.h5ad'
fn_droplet = Path(dir, droplet)
fn_facs = Path(dir, facs)
subset_dir = Path(dir,subset)

# Set variables
method = 'droplet'
# method = 'facs'

# Read data and subsetting it
adata = sc.read_h5ad(fn_droplet)

# log transform the raw data
# and set the 'raw' attribute to the natural logarithmized data
sc.pp.log1p(adata)
adata.raw = adata

# Subset data from large intestine
Tissue = 'Large_Intestine'
adata_subset = adata[adata.obs['tissue'].isin([Tissue])]

sc.tl.rank_genes_groups(adata_subset, groupby='age', use_raw=True, 
                        method='t-test_overestim_var', n_genes=10) # compute differential expression
sc.pl.rank_genes_groups_tracksplot(adata_subset, groupby='age') # plot the result

I keep having this error:

---------------------------------------------------------------------------
ZeroDivisionError                         Traceback (most recent call last)
/var/folders/1h/sldmgln943gbjzr5k6586xw00000gp/T/ipykernel_3810/3639964835.py in <module>
      1 # Perform the differential analysis
      2 
----> 3 sc.tl.rank_genes_groups(adata_subset, groupby='age', use_raw=True, 
      4                         method='t-test_overestim_var', n_genes=10) # compute differential expression
      5 sc.pl.rank_genes_groups_tracksplot(adata_subset, groupby='age') # plot the result

~/Library/Python/3.10/lib/python/site-packages/scanpy/tools/_rank_genes_groups.py in rank_genes_groups(adata, groupby, use_raw, groups, reference, n_genes, rankby_abs, pts, key_added, copy, method, corr_method, tie_correct, layer, **kwds)
    606     logg.debug(f'with sizes: {np.count_nonzero(test_obj.groups_masks, axis=1)}')
    607 
--> 608     test_obj.compute_statistics(
    609         method, corr_method, n_genes_user, rankby_abs, tie_correct, **kwds
    610     )

~/Library/Python/3.10/lib/python/site-packages/scanpy/tools/_rank_genes_groups.py in compute_statistics(self, method, corr_method, n_genes_user, rankby_abs, tie_correct, **kwds)
    376         n_genes = self.X.shape[1]
    377 
--> 378         for group_index, scores, pvals in generate_test_results:
    379             group_name = str(self.groups_order[group_index])
    380 

~/Library/Python/3.10/lib/python/site-packages/scanpy/tools/_rank_genes_groups.py in t_test(self, method)
    198         from scipy import stats
    199 
--> 200         self._basic_stats()
    201 
    202         for group_index, mask in enumerate(self.groups_masks):

~/Library/Python/3.10/lib/python/site-packages/scanpy/tools/_rank_genes_groups.py in _basic_stats(self)
    188                 mask_rest = ~mask
    189                 X_rest = self.X[mask_rest]
--> 190                 self.means_rest[imask], self.vars_rest[imask] = _get_mean_var(X_rest)
    191                 # this can be costly for sparse data
    192                 if self.comp_pts:

~/Library/Python/3.10/lib/python/site-packages/scanpy/preprocessing/_utils.py in _get_mean_var(X, axis)
      6 def _get_mean_var(X, *, axis=0):
      7     if sparse.issparse(X):
----> 8         mean, var = sparse_mean_variance_axis(X, axis=axis)
      9     else:
     10         mean = np.mean(X, axis=axis, dtype=np.float64)

~/Library/Python/3.10/lib/python/site-packages/scanpy/preprocessing/_utils.py in sparse_mean_variance_axis(mtx, axis)
     40         )
     41     else:
---> 42         return sparse_mean_var_minor_axis(mtx.data, mtx.indices, *shape, np.float64)
     43 
     44 

ZeroDivisionError: division by zero

I think the problem comes from the Large_Intestine data. If I use data comes from more than two tissues, there is no error messages. Looks like the Large_Intestine data contains some missing values. Could you help me to filter out this kind of data in scanpy?

Thank you for your reply in advance.

Tsunghan Hsieh, post doc in Riken tsung-han.hsieh@riken.jp

pocession commented 2 years ago

I found the problem. The large intestine droplet data set only contains one age group: 30m. That’s why the differential analysis could not be performed.