Thank you for creating this beautiful data set. I encounter some problems when using the Large intestine droplet data set.
My code:
# Packages
import scanpy as sc
import anndata
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from pathlib import Path
%matplotlib inline
# Dir
dir = os.getcwd()
subset = 'subset'
droplet = 'tabula-muris-senis-droplet-processed-official-annotations.h5ad'
facs = 'tabula-muris-senis-facs-processed-official-annotations.h5ad'
fn_droplet = Path(dir, droplet)
fn_facs = Path(dir, facs)
subset_dir = Path(dir,subset)
# Set variables
method = 'droplet'
# method = 'facs'
# Read data and subsetting it
adata = sc.read_h5ad(fn_droplet)
# log transform the raw data
# and set the 'raw' attribute to the natural logarithmized data
sc.pp.log1p(adata)
adata.raw = adata
# Subset data from large intestine
Tissue = 'Large_Intestine'
adata_subset = adata[adata.obs['tissue'].isin([Tissue])]
sc.tl.rank_genes_groups(adata_subset, groupby='age', use_raw=True,
method='t-test_overestim_var', n_genes=10) # compute differential expression
sc.pl.rank_genes_groups_tracksplot(adata_subset, groupby='age') # plot the result
I keep having this error:
---------------------------------------------------------------------------
ZeroDivisionError Traceback (most recent call last)
/var/folders/1h/sldmgln943gbjzr5k6586xw00000gp/T/ipykernel_3810/3639964835.py in <module>
1 # Perform the differential analysis
2
----> 3 sc.tl.rank_genes_groups(adata_subset, groupby='age', use_raw=True,
4 method='t-test_overestim_var', n_genes=10) # compute differential expression
5 sc.pl.rank_genes_groups_tracksplot(adata_subset, groupby='age') # plot the result
~/Library/Python/3.10/lib/python/site-packages/scanpy/tools/_rank_genes_groups.py in rank_genes_groups(adata, groupby, use_raw, groups, reference, n_genes, rankby_abs, pts, key_added, copy, method, corr_method, tie_correct, layer, **kwds)
606 logg.debug(f'with sizes: {np.count_nonzero(test_obj.groups_masks, axis=1)}')
607
--> 608 test_obj.compute_statistics(
609 method, corr_method, n_genes_user, rankby_abs, tie_correct, **kwds
610 )
~/Library/Python/3.10/lib/python/site-packages/scanpy/tools/_rank_genes_groups.py in compute_statistics(self, method, corr_method, n_genes_user, rankby_abs, tie_correct, **kwds)
376 n_genes = self.X.shape[1]
377
--> 378 for group_index, scores, pvals in generate_test_results:
379 group_name = str(self.groups_order[group_index])
380
~/Library/Python/3.10/lib/python/site-packages/scanpy/tools/_rank_genes_groups.py in t_test(self, method)
198 from scipy import stats
199
--> 200 self._basic_stats()
201
202 for group_index, mask in enumerate(self.groups_masks):
~/Library/Python/3.10/lib/python/site-packages/scanpy/tools/_rank_genes_groups.py in _basic_stats(self)
188 mask_rest = ~mask
189 X_rest = self.X[mask_rest]
--> 190 self.means_rest[imask], self.vars_rest[imask] = _get_mean_var(X_rest)
191 # this can be costly for sparse data
192 if self.comp_pts:
~/Library/Python/3.10/lib/python/site-packages/scanpy/preprocessing/_utils.py in _get_mean_var(X, axis)
6 def _get_mean_var(X, *, axis=0):
7 if sparse.issparse(X):
----> 8 mean, var = sparse_mean_variance_axis(X, axis=axis)
9 else:
10 mean = np.mean(X, axis=axis, dtype=np.float64)
~/Library/Python/3.10/lib/python/site-packages/scanpy/preprocessing/_utils.py in sparse_mean_variance_axis(mtx, axis)
40 )
41 else:
---> 42 return sparse_mean_var_minor_axis(mtx.data, mtx.indices, *shape, np.float64)
43
44
ZeroDivisionError: division by zero
I think the problem comes from the Large_Intestine data. If I use data comes from more than two tissues, there is no error messages. Looks like the Large_Intestine data contains some missing values. Could you help me to filter out this kind of data in scanpy?
Thank you for your reply in advance.
Tsunghan Hsieh, post doc in Riken
tsung-han.hsieh@riken.jp
I found the problem. The large intestine droplet data set only contains one age group: 30m. That’s why the differential analysis could not be performed.
Dear Tabula muris team:
Thank you for creating this beautiful data set. I encounter some problems when using the Large intestine droplet data set.
My code:
I keep having this error:
I think the problem comes from the Large_Intestine data. If I use data comes from more than two tissues, there is no error messages. Looks like the Large_Intestine data contains some missing values. Could you help me to filter out this kind of data in scanpy?
Thank you for your reply in advance.
Tsunghan Hsieh, post doc in Riken tsung-han.hsieh@riken.jp