STOmics / Stereopy

A toolkit of spatial transcriptomic analysis.
MIT License
187 stars 64 forks source link

How to obtain raw bin1 x gene count matrix from GEM file? #135

Closed bmill3r closed 1 year ago

bmill3r commented 1 year ago

I would like to obtain the "cell x gene" raw expression count matrix, where the cells in this case are the individual nanoballs (bin1).

For reference, I am working with the Mouse_brain_Adult_GEM_bin1.tsv.gz file located here: https://db.cngb.org/stomics/datasets/STDS0000058

So far I have been able to read in the file using:

import stereo as st
data_path = 'Mouse_brain_Adult_GEM_bin1.tsv.gz'
data = st.io.read_gem(
        file_path=data_path,
        sep='\t',
        bin_type='bins',
        bin_size=1,
        is_sparse=True,
        )
data

StereoExpData object with n_cells X n_genes = 42147676 X 26177
bin_type: bins
bin_size: 1
offset_x = 3225
offset_y = 6175
cells: ['cell_name']
genes: ['gene_name']

From here, are their any commands or workflows to extract the raw counts matrix? I tried:

data.tl.raw and data.raw but their values are None.

Any help would be greatly appreciated. Thanks! Brendan

UglyRay7 commented 1 year ago

I see you have loaded data from *.tsv.gz, using io.read_gem. Now that you could obtain expression matrix as below:

data.exp_matrix # get matrix object data.exp_matrix.toarray() # get matrix in array format

data.tl.raw works when you have performed data.tl.raw_checkpoint to save the raw matrix, which is usually done after filtering cells or genes.

Hope my answer would be helpful! Ray

bmill3r commented 1 year ago

Hi @UglyRay7,

Thanks so much for the quick reply. Yes, it seems the raw count matrix is stored in data.exp_matrix and I can access it that way. I tried data.tl.raw_checkpoint however data.tl.raw was still empty afterwards. But nonetheless, I can still get to the raw count data. Thanks!

Brendan