STOmics / Stereopy

A toolkit of spatial transcriptomic analysis.
MIT License
180 stars 59 forks source link

Discrepancy between adata.X and data.exp_matrix when reading loom files using scanpy #206

Closed WeipengMO closed 4 months ago

WeipengMO commented 8 months ago

I have encountered an issue when converting GEF files to loom files using the generate_loom function. Afterward, when I try to read the loom file into an adata object using scanpy, I noticed a discrepancy between the adata.X attribute and the data.exp_matrix matrix. However, the matrix within the 'matrix' layers of the adata object remains consistent with the original data.

image

WeipengMO commented 8 months ago
from stereo.tools import generate_loom
bgef_file = '/data/user/mowp/data/stereo-seq/Demo_MouseBrain/SS200000135TL_D1.tissue.gef'
gtf_file = '/data/user/mowp/data/stereo-seq/Demo_RNAVelocity/genes.gtf'
out_dir = '/home/mowp/test/SS200000135TL_D1_bgef/'
# generate loom file
loom_data = generate_loom(
                gef_path=bgef_file,
                gtf_path=gtf_file,
                bin_type='bins',
                bin_size=100,
                out_dir=out_dir
                )

adata_loom = sc.read_loom(loom_data)
data = st.io.read_gef(file_path=bgef_file, bin_size=100)

Comparing the matrix:

(data.exp_matrix == adata_loom.X).A.all()

False

(data.exp_matrix == adata_loom.layers['matrix']).A.all()

True

Zhenbin24 commented 5 months ago

Sorry for replying so late,. It is recommended to use np.array_equal to compare whether the two are consistent.

import numpy as np np.array_equal(data.exp_matrix.todense(), adata_loom.layers['matrix'].todense())

image

WeipengMO commented 5 months ago

Sorry for replying so late,. It is recommended to use np.array_equal to compare whether the two are consistent.

import numpy as np np.array_equal(data.exp_matrix.todense(), adata_loom.layers['matrix'].todense())

image

Hi @Zhenbin24 , thanks for your reply.

I found out what the problem is. adata_loom.X is not equal to adata_loom.layers['matrix']. I'm wondering what is the difference between these two layers when generate_loom generates the loom file?

image

Zhenbin24 commented 5 months ago

For the same file, the data storage formats read by the two may be different. For details, please refer to the source code of scanpy.