Open Zjianglin opened 6 months ago
Hello, this is a problem with the python library, we will optimize it in the next version. We have not encountered this problem here. You can consider restarting the notebook and then running the program to see.
Hi, I reinstalled steropy
seems bypass the No such device or address
problem. However, as I specified in the initial problem description, the When I reload a exorted ms_data object, it lost some attributes, how should I treat these warnnings?
problem still existed.
For example, 'batch', 'mean_umi' and 'hvgs' information were lost after reloading the exported ms_data
? How Should I save & load processed StereoExpData
and ms_data
?
And When I try to run the cluster_scatter
on the reloaded ms_data
, it crushed again?
rd_msd = st.io.read_h5ms(msh5ms_fp)
print(rd_msd)
rd_msd.plt.cluster_scatter(
res_key='leiden',
scope=slice_generator[:],
mode='integrate',
plotting_scale_width=10, # the width of scale
reorganize_coordinate=3, # the number of plots in each row
horizontal_offset_additional=20, # adjustment for horizontal distance
vertical_offset_additional=20 )
Here is the output:
ms_data: {'Low_MM1496': (69337, 28519), 'Mid_LL1084': (32392, 28127), 'High_CF142': (54044, 32377)}
num_slice: 3
names: ['Low_MM1496' 'Mid_LL1084' 'High_CF142']
obs: ['total_counts', 'pct_counts_mt', 'n_genes_by_counts', 'leiden']
var: ['n_cells', 'n_counts']
relationship: time_series
var_type: intersect to 24402
mss: ["scope_[0,1,2]:['highly_variable_genes', 'pca', 'neighbors', 'umap', 'leiden', 'gene_exp_leiden']"]
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[28], line 5
2 rd_msd = st.io.read_h5ms(msh5ms_fp)
3 print(rd_msd)
----> 5 rd_msd.plt.cluster_scatter(
6 res_key='leiden',
7 scope=slice_generator[:],
8 mode='integrate',
9 plotting_scale_width=10, # the width of scale
10 reorganize_coordinate=3, # the number of plots in each row
11 horizontal_offset_additional=20, # adjustment for horizontal distance
12 vertical_offset_additional=20 )
File ~/miniforge3/envs/stomicsPy38/lib/python3.8/site-packages/stereo/core/ms_pipeline.py:229, in MSDataPipeLine.__getattr__.<locals>.temp(*args, **kwargs)
227 if kwargs["mode"] == "integrate":
228 if self.ms_data.merged_data:
--> 229 return self._use_integrate_method(item, *args, **kwargs)
230 else:
231 raise Exception(
232 "`mode` integrate should merge first, using `ms_data.integrate`"
233 )
File ~/miniforge3/envs/stomicsPy38/lib/python3.8/site-packages/stereo/core/ms_pipeline.py:133, in MSDataPipeLine._use_integrate_method(self, item, *args, **kwargs)
130 return new_attr(*args, **kwargs)
132 logger.info(f'data_obj(idx=0) in ms_data start to run {item}')
--> 133 return new_attr(
134 ms_data_view.merged_data.__getattribute__(self.__class__.ATTR_NAME),
135 *args,
136 **kwargs
137 )
File ~/miniforge3/envs/stomicsPy38/lib/python3.8/site-packages/stereo/plots/decorator.py:47, in download.<locals>.wrapped(*args, **kwargs)
45 dpi = kwargs['out_dpi']
46 del kwargs['out_dpi']
---> 47 fig: Figure = func(*args, **kwargs)
48 if type(fig) is not Figure:
49 return fig
File ~/miniforge3/envs/stomicsPy38/lib/python3.8/site-packages/stereo/plots/decorator.py:31, in plot_scale.<locals>.wrapped(*args, **kwargs)
29 kwargs.setdefault('data_resolution', data_resolution)
30 kwargs.setdefault('data_bin_offset', data_bin_offset)
---> 31 return func(*args, **kwargs)
File ~/miniforge3/envs/stomicsPy38/lib/python3.8/site-packages/stereo/plots/decorator.py:139, in reorganize_coordinate.<locals>.wrapped(*args, **kwargs)
133 if reorganize_coordinate:
134 data.position, data.position_offset, data.position_min = \
135 reorganize_data_coordinates(
136 data.cells.batch, data.position, data.position_offset, data.position_min,
137 reorganize_coordinate, horizontal_offset_additional, vertical_offset_additional
138 )
--> 139 res = func(*args, **kwargs)
140 data.reset_position()
141 return res
File ~/miniforge3/envs/stomicsPy38/lib/python3.8/site-packages/stereo/plots/plot_collection.py:843, in PlotCollection.cluster_scatter(self, res_key, groups, show_others, title, x_label, y_label, dot_size, colors, invert_y, hue_order, width, height, base_image, base_cmap, **kwargs)
791 @download
792 @plot_scale
793 @reorganize_coordinate
(...)
810 **kwargs
811 ):
812 """
813 Spatial distribution ofter scatter.
814
(...)
841 :return: Spatial scatter distribution of clusters.
842 """ # noqa
--> 843 res = self.check_res_key(res_key)
844 group_list = res['group'].to_numpy()
845 n = np.unique(group_list).size
File ~/miniforge3/envs/stomicsPy38/lib/python3.8/site-packages/stereo/plots/plot_collection.py:1092, in PlotCollection.check_res_key(self, res_key)
1084 def check_res_key(self, res_key):
1085 """
1086 Check if result exist
1087
(...)
1090 :return: tool result
1091 """
-> 1092 if res_key in self.data.tl.result:
1093 res = self.data.tl.result[res_key]
1094 return res
File ~/miniforge3/envs/stomicsPy38/lib/python3.8/site-packages/stereo/core/result.py:58, in Result.__contains__(self, item)
56 def __contains__(self, item):
57 if self.contain_method:
---> 58 if self.contain_method(item):
59 return True
60 # TODO: when get item in ms_data[some_idx].tl.result, if name match the ms_data rule, it is very confused
File ~/miniforge3/envs/stomicsPy38/lib/python3.8/site-packages/stereo/core/ms_pipeline.py:95, in MSDataPipeLine._use_integrate_method.<locals>.contain_method(item)
93 def contain_method(item):
94 key_name = "scope_[" + ",".join(
---> 95 [str(self.ms_data._names.index(name)) for name in ms_data_view._names]) + "]"
96 scope_result = self.ms_data.tl.result.get(key_name, None)
97 if scope_result is None:
File ~/miniforge3/envs/stomicsPy38/lib/python3.8/site-packages/stereo/core/ms_pipeline.py:95, in <listcomp>(.0)
93 def contain_method(item):
94 key_name = "scope_[" + ",".join(
---> 95 [str(self.ms_data._names.index(name)) for name in ms_data_view._names]) + "]"
96 scope_result = self.ms_data.tl.result.get(key_name, None)
97 if scope_result is None:
AttributeError: 'numpy.ndarray' object has no attribute 'index'
Hello, please send me the execution process of the script so that I can check the context.
Hi , Here is my execution process codes:
import pandas as pd
import numpy as np
import os,sys
from natsort import natsorted
import stereo as st
import torch
from stereo.core.ms_data import MSData
from stereo.core.ms_pipeline import slice_generator
import warnings
warnings.filterwarnings('ignore')
get_ipython().run_line_magic('config', "InlineBackend.figure_format = 'retina'")
print("stereo version is {}".format(st.__version__))
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
print('device is {}'.format(device))
wkdir = '/path/to/StereoSeqs'
srcdata_dir = '//data/final/spatialTemp'
binsize = 50
projectx = 'LowMM1496_MidLL1084_HighCF142_240108'
out_dir = os.path.join(wkdir, '0stats')
processed_data_odir = os.path.join(wkdir, '1processed_data')
low_mm1496_geffp = os.path.join(wkdir, 'sawouts/MM1479/04.tissuecut/SS200000936BL_B5.tissue.gef')
mid_ll1084_geffp = os.path.join(wkdir, 'sawouts/LL_1084/04.tissuecut/SS200000153BR_B6.tissue.gef')
high_cf142_geffp = os.path.join(wkdir, "sawouts/CF142/04.tissuecut/SS200000685TL_F3.tissue.gef")
data_dt = {"Low_MM1496": st.io.read_gef(low_mm1496_geffp, bin_size=binsize),
"Mid_LL1084": st.io.read_gef(mid_ll1084_geffp, bin_size=binsize),
"High_CF142": st.io.read_gef(high_cf142_geffp, bin_size=binsize) }
for kx, vx in data_dt.items():
print("{} (Bin{}) : {}\n".format(kx, binsize, vx))
## construct MSData object
ms_data = MSData(_data_list=list(data_dt.values()), _names=list(data_dt.keys()), _relationship='other', _var_type='intersect')
print(ms_data)
print('\n')
print(ms_data[0])
del data_dt
import gc; gc.collect();
#不同样本之间的关系可以重新设置: new relationship must be in {'time_series', 'continuous', 'other'}
ms_data.relationship='time_series'
ms_data.integrate()
ms_data
#cal_qc有两个参数: scope指定运行分析哪些样本, mode指定分析方式是单个样本分别分析(`isolated`)或多样本分析()"integrate")
ms_data.tl.cal_qc(scope=slice_generator[:],mode='isolated')
print(ms_data)
print(ms_data[0].plt.violin())
print(ms_data[1].plt.violin())
print(ms_data[2].plt.violin())
##Preprocessing
#cal_qc有两个参数: scope指定运行分析哪些样本, mode指定分析方式是单个样本分别分析(`isolated`)或多样本分析()"integrate")
ms_data.tl.cal_qc(scope=slice_generator[:],mode='integrate')
print(ms_data)
print(ms_data.plt.violin())
ms_data.plt.spatial_scatter(
scope=slice_generator[:],
mode='integrate',
plotting_scale_width=10, # the width of scale
reorganize_coordinate=3, # the number of plots in each row
horizontal_offset_additional=20, # adjustment for horizontal distance
vertical_offset_additional=20 # adjustment for vertical distance
)
# you could filter data on three optional levels: cell, gene and coordinate.
ms_data.tl.filter_cells(min_gene=100, min_n_genes_by_counts=200, pct_counts_mt=15, scope=slice_generator[:], mode='integrate', inplace=True)
ms_data.tl.filter_genes(min_cell=3, scope=slice_generator[:], mode='integrate', inplace=True)
print(ms_data)
# strongly suggest to use self.raw to record the raw gene expression matrix which has been gone through basic processing, as an essential data set for subsequent differential testing and multiple analysis. When you want to get raw data, just run ms_data.tl.reset_raw_data().
ms_data.tl.raw_checkpoint()
print(ms_data.tl.raw)
print(ms_data.tl.raw)
print(ms_data)
#Normalization
ms_data.tl.normalize_total(target_sum=10000)
ms_data.tl.log1p()
#Highly variable genes
ms_data.tl.highly_variable_genes(
min_mean=0.0125,
max_mean=3,
min_disp=0.5,
n_top_genes=2000,
res_key='highly_variable_genes',
scope=slice_generator[:],
mode='integrate'
)
# remember to choose a res_key when plot
ms_data.plt.highly_variable_genes(res_key='highly_variable_genes')
#Scale each gene to unit variance. Clip values exceeding standard deviation 10. If data.tl.scale(zero_center=False) is used, sparse matrix will be used for calculation, which can greatly reduce the memory required for running.
ms_data.tl.scale(max_value=10, zero_center=True)
ms_data.tl.pca(
use_highly_genes=False,
n_pcs=30,
res_key='pca',
scope=slice_generator[:],
mode='integrate'
)
ms_data.tl.neighbors(scope=slice_generator[:],mode='integrate', pca_res_key='pca', n_pcs=30, res_key='neighbors', method='rapids')
ms_data.tl.umap(scope=slice_generator[:],mode='integrate', pca_res_key='pca', neighbors_res_key='neighbors', res_key='umap', method='rapids')
#Clustering, currently three common clustering methods, including Leiden, Louvain and Phenograph
ms_data.tl.leiden(
neighbors_res_key='neighbors',
res_key='leiden', resolution=1.0,
scope=slice_generator[:],
mode='integrate', method='rapids'
)
#Show the spatial distribution of UMAP and Leiden clustering.
ofp = os.path.join(out_dir, '001_{}_UMAP_resolution1.pdf'.format(projectx))
ms_data.plt.umap(
cluster_key='leiden',
res_key='umap',
scope=slice_generator[:],
mode='integrate',
out_path=ofp, out_dpi=300
)
ms_data.plt.cluster_scatter(res_key='leiden')
ofp = os.path.join(out_dir, '001_{}_cluster_scatter_resolution1.pdf'.format(projectx))
ms_data.plt.cluster_scatter(
res_key='leiden',
scope=slice_generator[:],
mode='integrate',
plotting_scale_width=10, # the width of scale
reorganize_coordinate=3, # the number of plots in each row
horizontal_offset_additional=20, # adjustment for horizontal distance
vertical_offset_additional=20 # adjustment for vertical distance
,out_path=ofp, out_dpi=300
)
#Output file
msh5ms_fp = os.path.join(processed_data_odir, '{}_240108_3integrate_msdata_beforeBatchQC.h5ms'.format(projectx))
st.io.write_h5ms(ms_data,output=msh5ms_fp)
print("Export {} to {}".format(ms_data, msh5ms_fp))
msh5ms_fp = os.path.join(processed_data_odir, '{}_240108_3integrate_msdata_beforeBatchQC.h5ms'.format(projectx))
rd_msd = st.io.read_h5ms(msh5ms_fp)
print(rd_msd)
Hi, another problem occurs here too! After I run the laden
and louvain
clustering, write_h5ms
. I encountered another Error
in read_h5ms
procedure. here is my running codes.
ofp = os.path.join(out_dir, "{}_002_QCed_msdata_{}_laiden_cluster_scatter_res{}.pdf".format(projectx, normafun, resolution))
print(ofp)
ms_data.plt.cluster_scatter(
res_key='leiden',
scope=slice_generator[:],
mode='integrate',
plotting_scale_width=10, # the width of scale
reorganize_coordinate=3, # the number of plots in each row
horizontal_offset_additional=20, # adjustment for horizontal distance
vertical_offset_additional=20 # adjustment for vertical distance
,out_path=ofp, out_dpi=300
)
ofp = os.path.join(out_dir, "{}_002_QCed_msdata_{}_louvain_cluster_scatter_res{}.pdf".format(projectx, normafun, resolution))
print(ofp)
ms_data.plt.cluster_scatter(
res_key='louvain',
scope=slice_generator[:],
mode='integrate',
plotting_scale_width=10, # the width of scale
reorganize_coordinate=3, # the number of plots in each row
horizontal_offset_additional=20, # adjustment for horizontal distance
vertical_offset_additional=20 # adjustment for vertical distance
,out_path=ofp, out_dpi=300
)
ofp = os.path.join(out_dir, "{}_002_QCed_msdata_{}_laiden_cluster_res{}_markers.csv".format(projectx, normafun, resolution))
print(ofp)
ms_data.tl.find_marker_genes(cluster_res_key='leiden', use_highly_genes=False, use_raw =False,
res_key='leiden_cluster_marker_genes', output=ofp, n_jobs=24)
print('Save leiden_cluster_marker_genes to {}\n'.format(ofp))
ofp = os.path.join(out_dir, "{}_002_QCed_msdata_{}_laiden_cluster_res{}_marker_genes_heatmap.pdf".format(projectx, normafun, resolution))
print(ofp)
ms_data.plt.marker_genes_heatmap(res_key='leiden_cluster_marker_genes', markers_num=6,out_path=ofp, out_dpi=300)
ofp = os.path.join(out_dir, "{}_002_QCed_msdata_{}_louvain_cluster_res{}_markers.csv".format(projectx, normafun, resolution))
print(ofp)
ms_data.tl.find_marker_genes(cluster_res_key='louvain', use_highly_genes=False, use_raw =False,
res_key='louvain_cluster_marker_genes', output=ofp, n_jobs=24)
print('Save leiden_cluster_marker_genes to {}\n'.format(ofp))
ofp = os.path.join(out_dir, "{}_002_QCed_msdata_{}_louvain_cluster_res{}_marker_genes_heatmap.pdf".format(projectx, normafun, resolution))
print(ofp)
ms_data.plt.marker_genes_heatmap(res_key='louvain_cluster_marker_genes', markers_num=6,out_path=ofp, out_dpi=300)
#Output file
msh5ms_fp = os.path.join(processed_data_odir, '{}_240108_3integrate_msdata_beforeBatchQC.h5ms'.format(projectx))
st.io.write_h5ms(ms_data,output=msh5ms_fp)
print("Export {} to {}".format(ms_data, msh5ms_fp))
##
xport ms_data: {'Low_MM1496': (69337, 28519), 'Mid_LL1084': (32392, 28127), 'High_CF142': (54044, 32377)}
num_slice: 3
names: ['Low_MM1496', 'Mid_LL1084', 'High_CF142']
obs: ['batch', 'total_counts', 'n_genes_by_counts', 'pct_counts_mt']
var: ['n_cells', 'n_counts', 'mean_umi', 'hvgs']
relationship: other
var_type: intersect to 24403
mss: ["scope_[0,1,2]:['highly_variable_genes', 'pca', 'pca_variance_ratio', 'neighbors', 'umap', 'leiden', 'gene_exp_leiden', 'louvain', 'gene_exp_louvain', 'leiden_cluster_marker_genes', 'louvain_cluster_marker_genes']"]
to XXX
above codes run normally.
However, when i try to reload the exported ms_data
, it crushed!
msh5ms_fp = os.path.join(processed_data_odir, '{}_240108_3integrate_msdata_beforeBatchQC.h5ms'.format(projectx))
rd_msd = st.io.read_h5ms(msh5ms_fp)
print(rd_msd)
## output
[2024-01-16 19:14:17][Stereo][88627][MainThread][23319675127616][reader][434][WARNING]: obs not in rules, did not read from h5ms
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
Cell In[30], line 2
1 msh5ms_fp = os.path.join(processed_data_odir, '{}_240108_3integrate_msdata_beforeBatchQC.h5ms'.format(projectx))
----> 2 rd_msd = st.io.read_h5ms(msh5ms_fp)
3 print(rd_msd)
File ~/miniforge3/envs/stomicsPy38/lib/python3.8/site-packages/stereo/utils/read_write_utils.py:37, in ReadWriteUtils.check_file_exists.<locals>.wrapped(*args, **kwargs)
35 else:
36 raise FileNotFoundError("Please ensure there is a file")
---> 37 return func(*args, **kwargs)
File ~/miniforge3/envs/stomicsPy38/lib/python3.8/site-packages/stereo/io/reader.py:419, in read_h5ms(file_path, use_raw, use_result)
417 elif k == 'sample_merged':
418 merged_data = StereoExpData()
--> 419 merged_data = _read_stereo_h5ad_from_group(f[k], merged_data, use_raw, use_result) # noqa
420 elif k == 'names':
421 names = h5ad.read_dataset(f[k])
File ~/miniforge3/envs/stomicsPy38/lib/python3.8/site-packages/stereo/io/reader.py:293, in _read_stereo_h5ad_from_group(f, data, use_raw, use_result, bin_type, bin_size)
291 if use_result is True and 'key_record' in f.keys():
292 h5ad.read_key_record(f['key_record'], data.tl.key_record)
--> 293 _read_stereo_h5_result(data.tl.key_record, data, f)
294 return data
File ~/miniforge3/envs/stomicsPy38/lib/python3.8/site-packages/stereo/io/reader.py:349, in _read_stereo_h5_result(key_record, data, f)
347 if analysis_key == 'marker_genes':
348 clusters = h5ad.read_dataset(f[f'clusters_record@{res_key}@marker_genes'])
--> 349 data.tl.result[res_key] = {}
350 for cluster in clusters:
351 cluster_key = f'{cluster}@{res_key}@marker_genes'
File ~/miniforge3/envs/stomicsPy38/lib/python3.8/site-packages/stereo/core/result.py:165, in Result.__setitem__(self, key, value)
163 for name_type, name_dict in Result.TYPE_NAMES_DICT.items():
164 for like_name in name_dict:
--> 165 if not key.startswith('gene_exp_') and like_name in key and self._real_set_item(name_type, key, value):
166 return
167 if type(value) is pd.DataFrame:
File ~/miniforge3/envs/stomicsPy38/lib/python3.8/site-packages/stereo/core/result.py:143, in Result._real_set_item(self, type, key, value)
141 def _real_set_item(self, type, key, value):
142 if type == Result.CLUSTER:
--> 143 self._set_cluster_res(key, value)
144 elif type == Result.CONNECTIVITY:
145 self._set_connectivities_res(key, value)
File ~/miniforge3/envs/stomicsPy38/lib/python3.8/site-packages/stereo/core/result.py:188, in Result._set_cluster_res(self, key, value)
187 def _set_cluster_res(self, key, value):
--> 188 assert type(value) is pd.DataFrame and 'group' in value.columns.values, "this is not cluster res"
189 warn(
190 f'FutureWarning: {key} will be moved from `StereoExpData.tl.result` to `StereoExpData.cells` in the '
191 'future, make sure your code set the property correctly. ',
192 category=FutureWarning
193 )
194 self.__stereo_exp_data.cells._obs[key] = value['group'].values
AssertionError: this is not cluster res
Did I do something wrong? Or how could I figure it out? Thanks
Hello, I have no problem executing your script. You can try to install the latest stereopy version (V1.1.0)
` low_mm1496_geffp = os.path.join(wkdir, 'Embyro_E9.5.h5ad') mid_ll1084_geffp = os.path.join(wkdir, 'Embyro_E10.5.h5ad') high_cf142_geffp = os.path.join(wkdir, "Embyro_E11.5.h5ad")
data_dt = {"Low_MM1496": st.io.read_h5ad(low_mm1496_geffp), "Mid_LL1084": st.io.read_h5ad(mid_ll1084_geffp), "High_CF142": st.io.read_h5ad(high_cf142_geffp) } `
test data url:http://116.6.21.110:8090/share/dd965cba-7c1f-40b2-a275-0150890e005f
Hi @Zhenbin24 , After adding some other samples, I encountered this problem again.
Here is my write_h5ms
function output logs:
...
obs: ['batch', 'total_counts', 'n_genes_by_counts', 'pct_counts_mt']
var: ['n_cells', 'n_counts', 'mean_umi', 'hvgs']
relationship: other
var_type: intersect to 22646
mss: ["scope_[0,1,2,3,4,5,6]:['highly_variable_genes', 'pca', 'pca_variance_ratio', 'neighbors', 'umap', 'leiden', 'gene_exp_leiden', 'louvain', 'gene_exp_louvain', 'leiden_cluster_marker_genes', 'louvain_cluster_marker_genes']"]
to XXX
Did you simutaneously run the louvain
as well as leiden
clustering? I did this, and It can not be imported again.
hello, @Zjianglin .
Did you solve the st.io.read_h5ms issue? Thank you.
Hi, I tried to using
stereopy
to analysis mystereo-seq
samples. I create ams_data
object from 3 slices according to the tutorial. When I tried toEvaluate batch effection
usingbatch_qc
function, it raised an error:Here is the reproduce steps:
Here is the output and error information:
Could you please help me figure it out? Thanks.
Btw, When I reload a exorted
ms_data
object, it lost some attributes, how should I treat these warnnings? Do I need rerun the normalization, scale, embeddings, clustering procedures after reload an exportedms_data
object?