STOmics / Stereopy

A toolkit of spatial transcriptomic analysis.
MIT License
184 stars 64 forks source link

Is there a way to subset a StereoExpData by cell (or bin) ID? #155

Closed bmill3r closed 1 year ago

bmill3r commented 1 year ago

Hello,

I would like to subset the DNBs that are outside of the tissue cut area.

For example:

import stereo as st

## all the DNBs
bin1 = st.io.read_gef(bin_path, bin_size=1)

## just the DNBs within the tissue region
tissue_bin1 = st.io.read_gef(tissuebin_path, bin_size=1)

## indices of DNBs for each obejct
idx1 = bin1.cells.to_df().index
idx2 = tissue_bin1.cells.to_df().index

## indices of DNBs not in tissue region
dnbs_outside_tissue = idx1[~idx1.isin(idx2)]

I tried to use st.tl.filter_cells() but this does not seem to work:

out_tissue_bin1 = bin1.tl.filter_cells(bin1,
                                             cell_list = dnbs_outside_tissue.values.tolist(),
                                             inplace = False)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[58], line 1
----> 1 out_tissue_bin1 = bin1.tl.filter_cells(bin1,
      2                                              cell_list = dnbs_outside_tissue.values.tolist(),
      3                                              inplace = False)

File ~/mambaforge/envs/py3.8/lib/python3.8/site-packages/stereo/core/st_pipeline.py:39, in logit.<locals>.wrapped(*args, **kwargs)
     37 logger.info('start to run {}...'.format(func.__name__))
     38 tk = tc.start()
---> 39 res = func(*args, **kwargs)
     40 logger.info('{} end, consume time {:.4f}s.'.format(func.__name__, tc.get_time_consumed(key=tk, restart=False)))
     41 return res

File ~/mambaforge/envs/py3.8/lib/python3.8/site-packages/stereo/core/st_pipeline.py:190, in StPipeline.filter_cells(self, min_gene, max_gene, min_n_genes_by_counts, max_n_genes_by_counts, pct_counts_mt, cell_list, inplace)
    164 """
    165 Filter cells based on counts or the numbers of genes expressed.
    166 
   (...)
    187 Depending on `inplace`, if `True`, the data will be replaced by those filtered.
    188 """
    189 from ..preprocess.filter import filter_cells
--> 190 data = filter_cells(self.data, min_gene, max_gene, min_n_genes_by_counts, max_n_genes_by_counts, pct_counts_mt,
    191                     cell_list, inplace)
    192 return data

File ~/mambaforge/envs/py3.8/lib/python3.8/site-packages/stereo/preprocess/filter.py:60, in filter_cells(data, min_gene, max_gene, min_n_genes_by_counts, max_n_genes_by_counts, pct_counts_mt, cell_list, inplace)
     58 cal_cells_indicators(data)
     59 if min_gene:
---> 60     cell_subset = data.cells.total_counts >= min_gene
     61     data.sub_by_index(cell_index=cell_subset)
     62 if max_gene:

TypeError: '>=' not supported between instances of 'int' and 'StereoExpData'

I have also tried to subset similar to an Anndata object, for example:

bin1[dnbs_outside_tissue.values.tolist(),:]

but this also does not work.

Any other suggestions?

Thanks, Brendan

tanliwei-coder commented 1 year ago

the variable bin1 doesn't need to be gave to filter_cells.