STOmics / Stereopy

A toolkit of spatial transcriptomic analysis.
MIT License
180 stars 59 forks source link

Run error of "batch_qc": OSError: [Errno 6] No such device or address #234

Open Zjianglin opened 6 months ago

Zjianglin commented 6 months ago

Hi, I tried to using stereopy to analysis my stereo-seq samples. I create a ms_data object from 3 slices according to the tutorial. When I tried to Evaluate batch effection using batch_qc function, it raised an error:

Here is the reproduce steps:

# stereo version is 1.0.0
#device is cuda:0

ms_data = MSData(_data_list=list(data_dt.values()), _names=list(data_dt.keys()), _relationship='other', _var_type='intersect')
ms_data.relationship='time_series'
ms_data.integrate()
ms_data

ms_data.tl.cal_qc(scope=slice_generator[:],mode='isolated')
print(ms_data)
ms_data.tl.cal_qc(scope=slice_generator[:],mode='integrate')

ms_data.tl.filter_cells(min_gene=100, min_n_genes_by_counts=200, pct_counts_mt=15, scope=slice_generator[:], mode='integrate', inplace=True)
ms_data.tl.filter_genes(min_cell=3, scope=slice_generator[:], mode='integrate', inplace=True)

ms_data.tl.raw_checkpoint()
print(ms_data.tl.raw)

ms_data.tl.normalize_total(target_sum=10000)
ms_data.tl.log1p()

#Highly variable genes
ms_data.tl.highly_variable_genes(
        min_mean=0.0125,
        max_mean=3,
        min_disp=0.5,
        n_top_genes=2000,
        res_key='highly_variable_genes',
        scope=slice_generator[:],
        mode='integrate'
        )
ms_data.tl.scale(max_value=10, zero_center=True)

ms_data.tl.pca(
        use_highly_genes=False,
        n_pcs=30,
        res_key='pca',
        scope=slice_generator[:],
        mode='integrate'
        )
ms_data.tl.neighbors(scope=slice_generator[:],mode='integrate', pca_res_key='pca', n_pcs=30, res_key='neighbors')
ms_data.tl.umap(scope=slice_generator[:],mode='integrate', pca_res_key='pca', neighbors_res_key='neighbors', res_key='umap')
ms_data.tl.leiden(
            neighbors_res_key='neighbors',
            res_key='leiden', resolution=1.0,
            scope=slice_generator[:],
            mode='integrate'
            )

#Batch evaluation
bqc_dir = os.path.join(out_dir, '{}_batch_qc'.format(projectx))
print(bqc_dir)
ms_data.tl.batch_qc(scope=slice_generator[:],mode='integrate', cluster_res_key='leiden', 
                    report_path=bqc_dir, res_key='batch_qc', gpu=device, num_threads=64)

Here is the output and error information:

[2024-01-08 14:20:15][Stereo][45053][MainThread][22605523550848][ms_pipeline][122][INFO]: register algorithm batch_qc to <class 'stereo.core.stereo_exp_data.StereoExpData'>-22597924543264
/home/ncba04/datasets/projectX/0stats/LowMM1496_MidLL1084_HighCF142_240108_batch_qc
[2024-01-08 14:48:26][Stereo][45053][MainThread][22605523550848][classifier][144][INFO]: Model Training Finished!
[2024-01-08 14:48:26][Stereo][45053][MainThread][22605523550848][classifier][145][INFO]: Trained checkpoint file has been saved to /home/ncba04/datasets/projectX/0stats/LowMM1496_MidLL1084_HighCF142_240108_batch_qc
---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
Cell In[72], line 4
      2 bqc_dir = os.path.join(out_dir, '{}_batch_qc'.format(projectx))
      3 print(bqc_dir)
----> 4 ms_data.tl.batch_qc(scope=slice_generator[:],mode='integrate', cluster_res_key='leiden', 
      5                     report_path=bqc_dir, res_key='batch_qc', gpu=device, num_threads=64)

File ~/miniforge3/envs/stomics/lib/python3.8/site-packages/stereo/core/ms_pipeline.py:229, in MSDataPipeLine.__getattr__.<locals>.temp(*args, **kwargs)
    227 if kwargs["mode"] == "integrate":
    228     if self.ms_data.merged_data:
--> 229         return self._use_integrate_method(item, *args, **kwargs)
    230     else:
    231         raise Exception(
    232             "`mode` integrate should merge first, using `ms_data.integrate`"
    233         )

File ~/miniforge3/envs/stomics/lib/python3.8/site-packages/stereo/core/ms_pipeline.py:123, in MSDataPipeLine._use_integrate_method(self, item, *args, **kwargs)
    121     if new_attr:
    122         logger.info(f'register algorithm {item} to {type(merged_data)}-{id(merged_data)}')
--> 123         return new_attr(*args, **kwargs)
    124 else:
    125     from stereo.plots.plot_base import PlotBase

File ~/miniforge3/envs/stomics/lib/python3.8/site-packages/stereo/algorithm/batch_qc/main.py:39, in BatchQc.main(self, n_neighbors, condition, count_key, cluster_res_key, report_path, gpu, data_loader_num_workers, num_threads, res_key)
     36 if data_loader_num_workers <= 0 or data_loader_num_workers > cpu_count():
     37     data_loader_num_workers = cpu_count()
---> 39 self.pipeline_res[res_key] = batchqc_raw(
     40     self.stereo_exp_data,
     41     n_neighbors=n_neighbors,
     42     condition=condition,
     43     count_key=count_key,
     44     celltype_key=cluster_res_key,
     45     report_path=report_path,
     46     gpu=gpu,
     47     data_loader_num_workers=data_loader_num_workers,
     48     num_threads=num_threads
     49 )

File ~/miniforge3/envs/stomics/lib/python3.8/site-packages/stereo/algorithm/batch_qc/batchqc_raw.py:187, in batchqc_raw(data, n_neighbors, batch_key, condition, count_key, celltype_key, report_path, gpu, data_loader_num_workers, num_threads)
    181 summary_df = pd.DataFrame(
    182     data={"score": f"{batch_score:.4f}", "adaptive threshold": threshold, "conclusion": conclusion},
    183     index=["result"]
    184 )
    186 output_dict["table"]["summary"] = summary_df
--> 187 output_dict["report_path"] = generate_report(data_dict=output_dict, save_path=report_path)
    189 return output_dict

File ~/miniforge3/envs/stomics/lib/python3.8/site-packages/stereo/algorithm/batch_qc/batchqc_raw.py:207, in generate_report(data_dict, save_path, type)
    204 html = etree.HTML(html_data)
    206 # -------- set username & run time --------
--> 207 embed_text(html, pos="h4", name="username", text=f"Report By: {os.getlogin()}")
    208 embed_text(html, pos="h5", name="runtime",
    209            text=f"Report Time: {time.strftime('%Y-%m-%d %H:%M:%S', time.localtime())}")
    211 # -------- insert table --------

OSError: [Errno 6] No such device or address
$ ll /home/ncba04/datasets/projectX/0stats/LowMM1496_MidLL1084_HighCF142_240108_batch_qc
total 332
drwxrwxr-x 2 ncba04 ncba04   4096 Jan  8 14:14 .
drwxr-xr-x 3 ncba04 ncba04   4096 Jan  8 14:14 ..
-rw-rw-r-- 1 ncba04 ncba04 330039 Jan  8 14:36 batch_model.bgi

Could you please help me figure it out? Thanks.

Btw, When I reload a exorted ms_data object, it lost some attributes, how should I treat these warnnings? Do I need rerun the normalization, scale, embeddings, clustering procedures after reload an exported ms_data object?


msh5ms_fp = os.path.join(processed_data_odir,'{}_240108_3integrate_msdata_beforeBatchQC.h5ms'.format(projectx))
st.io.write_h5ms(ms_data,output=msh5ms_fp)
print("Export {} to {}".format(ms_data, msh5ms_fp))
#run Output
Export ms_data: {'Low_MM1496': (69337, 28519), 'Mid_LL1084': (32392, 28127), 'High_CF142': (54044, 32377)}
num_slice: 3
names: ['Low_MM1496', 'Mid_LL1084', 'High_CF142']
obs: ['batch', 'total_counts', 'n_genes_by_counts', 'pct_counts_mt']
var: ['n_cells', 'n_counts', 'mean_umi', 'hvgs']
relationship: time_series
var_type: intersect to 24402
mss: ["scope_[0,1,2]:['highly_variable_genes', 'pca', 'pca_variance_ratio', 'neighbors', 'umap', 'leiden', 'gene_exp_leiden']"]
 to /home/ncba04/datasets/projectX/outs/LowMM1496_MidLL1084_HighCF142_240108_240108_3integrate_msdata_beforeBatchQC.h5ms

#reload
msh5ms_fp = os.path.join(processed_data_odir, '{}_240108_3integrate_msdata_beforeBatchQC.h5ms'.format(projectx))
rd_msd = st.io.read_h5ms(msh5ms_fp)
print(rd_msd)
## run output

[2024-01-08 16:48:07][Stereo][45053][MainThread][22605523550848][reader][434][WARNING]: obs not in rules, did not read from h5ms
[2024-01-08 16:48:24][Stereo][45053][MainThread][22605523550848][reader][434][WARNING]: var not in rules, did not read from h5ms
ms_data: {'Low_MM1496': (69337, 28519), 'Mid_LL1084': (32392, 28127), 'High_CF142': (54044, 32377)}
num_slice: 3
names: ['Low_MM1496' 'Mid_LL1084' 'High_CF142']
obs: ['total_counts', 'pct_counts_mt', 'n_genes_by_counts', 'leiden']
var: ['n_cells', 'n_counts']
relationship: time_series
var_type: intersect to 24402
mss: ["scope_[0,1,2]:['highly_variable_genes', 'pca', 'neighbors', 'umap', 'leiden', 'gene_exp_leiden']"]
Zhenbin24 commented 6 months ago

Hello, this is a problem with the python library, we will optimize it in the next version. We have not encountered this problem here. You can consider restarting the notebook and then running the program to see.

Zjianglin commented 6 months ago

Hi, I reinstalled steropy seems bypass the No such device or address problem. However, as I specified in the initial problem description, the When I reload a exorted ms_data object, it lost some attributes, how should I treat these warnnings? problem still existed.

For example, 'batch', 'mean_umi' and 'hvgs' information were lost after reloading the exported ms_data? How Should I save & load processed StereoExpData and ms_data?

And When I try to run the cluster_scatter on the reloaded ms_data, it crushed again?

rd_msd = st.io.read_h5ms(msh5ms_fp)
print(rd_msd)

rd_msd.plt.cluster_scatter(
            res_key='leiden',
            scope=slice_generator[:],
            mode='integrate',
            plotting_scale_width=10,          # the width of scale
            reorganize_coordinate=3,          # the number of plots in each row
            horizontal_offset_additional=20,  # adjustment for horizontal distance
            vertical_offset_additional=20 )

Here is the output:

ms_data: {'Low_MM1496': (69337, 28519), 'Mid_LL1084': (32392, 28127), 'High_CF142': (54044, 32377)}
num_slice: 3
names: ['Low_MM1496' 'Mid_LL1084' 'High_CF142']
obs: ['total_counts', 'pct_counts_mt', 'n_genes_by_counts', 'leiden']
var: ['n_cells', 'n_counts']
relationship: time_series
var_type: intersect to 24402
mss: ["scope_[0,1,2]:['highly_variable_genes', 'pca', 'neighbors', 'umap', 'leiden', 'gene_exp_leiden']"]

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[28], line 5
      2 rd_msd = st.io.read_h5ms(msh5ms_fp)
      3 print(rd_msd)
----> 5 rd_msd.plt.cluster_scatter(
      6             res_key='leiden',
      7             scope=slice_generator[:],
      8             mode='integrate',
      9             plotting_scale_width=10,          # the width of scale
     10             reorganize_coordinate=3,          # the number of plots in each row
     11             horizontal_offset_additional=20,  # adjustment for horizontal distance
     12             vertical_offset_additional=20 )

File ~/miniforge3/envs/stomicsPy38/lib/python3.8/site-packages/stereo/core/ms_pipeline.py:229, in MSDataPipeLine.__getattr__.<locals>.temp(*args, **kwargs)
    227 if kwargs["mode"] == "integrate":
    228     if self.ms_data.merged_data:
--> 229         return self._use_integrate_method(item, *args, **kwargs)
    230     else:
    231         raise Exception(
    232             "`mode` integrate should merge first, using `ms_data.integrate`"
    233         )

File ~/miniforge3/envs/stomicsPy38/lib/python3.8/site-packages/stereo/core/ms_pipeline.py:133, in MSDataPipeLine._use_integrate_method(self, item, *args, **kwargs)
    130             return new_attr(*args, **kwargs)
    132 logger.info(f'data_obj(idx=0) in ms_data start to run {item}')
--> 133 return new_attr(
    134     ms_data_view.merged_data.__getattribute__(self.__class__.ATTR_NAME),
    135     *args,
    136     **kwargs
    137 )

File ~/miniforge3/envs/stomicsPy38/lib/python3.8/site-packages/stereo/plots/decorator.py:47, in download.<locals>.wrapped(*args, **kwargs)
     45     dpi = kwargs['out_dpi']
     46     del kwargs['out_dpi']
---> 47 fig: Figure = func(*args, **kwargs)
     48 if type(fig) is not Figure:
     49     return fig

File ~/miniforge3/envs/stomicsPy38/lib/python3.8/site-packages/stereo/plots/decorator.py:31, in plot_scale.<locals>.wrapped(*args, **kwargs)
     29         kwargs.setdefault('data_resolution', data_resolution)
     30         kwargs.setdefault('data_bin_offset', data_bin_offset)
---> 31 return func(*args, **kwargs)

File ~/miniforge3/envs/stomicsPy38/lib/python3.8/site-packages/stereo/plots/decorator.py:139, in reorganize_coordinate.<locals>.wrapped(*args, **kwargs)
    133     if reorganize_coordinate:
    134         data.position, data.position_offset, data.position_min = \
    135             reorganize_data_coordinates(
    136                 data.cells.batch, data.position, data.position_offset, data.position_min,
    137                 reorganize_coordinate, horizontal_offset_additional, vertical_offset_additional
    138             )
--> 139 res = func(*args, **kwargs)
    140 data.reset_position()
    141 return res

File ~/miniforge3/envs/stomicsPy38/lib/python3.8/site-packages/stereo/plots/plot_collection.py:843, in PlotCollection.cluster_scatter(self, res_key, groups, show_others, title, x_label, y_label, dot_size, colors, invert_y, hue_order, width, height, base_image, base_cmap, **kwargs)
    791 @download
    792 @plot_scale
    793 @reorganize_coordinate
   (...)
    810         **kwargs
    811 ):
    812     """
    813     Spatial distribution ofter scatter.
    814 
   (...)
    841     :return: Spatial scatter distribution of clusters.
    842     """  # noqa
--> 843     res = self.check_res_key(res_key)
    844     group_list = res['group'].to_numpy()
    845     n = np.unique(group_list).size

File ~/miniforge3/envs/stomicsPy38/lib/python3.8/site-packages/stereo/plots/plot_collection.py:1092, in PlotCollection.check_res_key(self, res_key)
   1084 def check_res_key(self, res_key):
   1085     """
   1086     Check if result exist
   1087 
   (...)
   1090     :return: tool result
   1091     """
-> 1092     if res_key in self.data.tl.result:
   1093         res = self.data.tl.result[res_key]
   1094         return res

File ~/miniforge3/envs/stomicsPy38/lib/python3.8/site-packages/stereo/core/result.py:58, in Result.__contains__(self, item)
     56 def __contains__(self, item):
     57     if self.contain_method:
---> 58         if self.contain_method(item):
     59             return True
     60         # TODO: when get item in ms_data[some_idx].tl.result, if name match the ms_data rule, it is very confused

File ~/miniforge3/envs/stomicsPy38/lib/python3.8/site-packages/stereo/core/ms_pipeline.py:95, in MSDataPipeLine._use_integrate_method.<locals>.contain_method(item)
     93 def contain_method(item):
     94     key_name = "scope_[" + ",".join(
---> 95         [str(self.ms_data._names.index(name)) for name in ms_data_view._names]) + "]"
     96     scope_result = self.ms_data.tl.result.get(key_name, None)
     97     if scope_result is None:

File ~/miniforge3/envs/stomicsPy38/lib/python3.8/site-packages/stereo/core/ms_pipeline.py:95, in <listcomp>(.0)
     93 def contain_method(item):
     94     key_name = "scope_[" + ",".join(
---> 95         [str(self.ms_data._names.index(name)) for name in ms_data_view._names]) + "]"
     96     scope_result = self.ms_data.tl.result.get(key_name, None)
     97     if scope_result is None:

AttributeError: 'numpy.ndarray' object has no attribute 'index'
Zhenbin24 commented 6 months ago

Hello, please send me the execution process of the script so that I can check the context.

Zjianglin commented 6 months ago

Hi , Here is my execution process codes:

import pandas as pd
import numpy as np
import os,sys
from natsort import natsorted
import stereo as st
import torch
from stereo.core.ms_data import MSData
from stereo.core.ms_pipeline import slice_generator
import warnings
warnings.filterwarnings('ignore')

get_ipython().run_line_magic('config', "InlineBackend.figure_format = 'retina'")
print("stereo version is {}".format(st.__version__))
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
print('device is {}'.format(device))

wkdir = '/path/to/StereoSeqs'
srcdata_dir = '//data/final/spatialTemp'
binsize = 50
projectx = 'LowMM1496_MidLL1084_HighCF142_240108'

out_dir = os.path.join(wkdir, '0stats')
processed_data_odir = os.path.join(wkdir, '1processed_data')

low_mm1496_geffp = os.path.join(wkdir, 'sawouts/MM1479/04.tissuecut/SS200000936BL_B5.tissue.gef')
mid_ll1084_geffp = os.path.join(wkdir, 'sawouts/LL_1084/04.tissuecut/SS200000153BR_B6.tissue.gef')
high_cf142_geffp = os.path.join(wkdir, "sawouts/CF142/04.tissuecut/SS200000685TL_F3.tissue.gef")

data_dt = {"Low_MM1496": st.io.read_gef(low_mm1496_geffp, bin_size=binsize), 
               "Mid_LL1084": st.io.read_gef(mid_ll1084_geffp, bin_size=binsize), 
               "High_CF142": st.io.read_gef(high_cf142_geffp, bin_size=binsize) }
for kx, vx in data_dt.items():
    print("{} (Bin{}) : {}\n".format(kx, binsize, vx))

## construct MSData object
ms_data = MSData(_data_list=list(data_dt.values()), _names=list(data_dt.keys()), _relationship='other', _var_type='intersect')
print(ms_data)
print('\n')
print(ms_data[0])

del data_dt
import  gc;   gc.collect();

#不同样本之间的关系可以重新设置: new relationship must be in {'time_series', 'continuous', 'other'}
ms_data.relationship='time_series'

ms_data.integrate()
ms_data

#cal_qc有两个参数: scope指定运行分析哪些样本, mode指定分析方式是单个样本分别分析(`isolated`)或多样本分析()"integrate")
ms_data.tl.cal_qc(scope=slice_generator[:],mode='isolated')
print(ms_data)

print(ms_data[0].plt.violin())
print(ms_data[1].plt.violin())
print(ms_data[2].plt.violin())

##Preprocessing
#cal_qc有两个参数: scope指定运行分析哪些样本, mode指定分析方式是单个样本分别分析(`isolated`)或多样本分析()"integrate")
ms_data.tl.cal_qc(scope=slice_generator[:],mode='integrate')
print(ms_data)

print(ms_data.plt.violin())

ms_data.plt.spatial_scatter(
            scope=slice_generator[:],
            mode='integrate',
            plotting_scale_width=10,          # the width of scale
            reorganize_coordinate=3,          # the number of plots in each row
            horizontal_offset_additional=20,  # adjustment for horizontal distance
            vertical_offset_additional=20     # adjustment for vertical distance
            )

# you could filter data on three optional levels: cell, gene and coordinate.
ms_data.tl.filter_cells(min_gene=100, min_n_genes_by_counts=200, pct_counts_mt=15, scope=slice_generator[:], mode='integrate', inplace=True)
ms_data.tl.filter_genes(min_cell=3, scope=slice_generator[:], mode='integrate', inplace=True)
print(ms_data)

# strongly suggest to use self.raw to record the raw gene expression matrix which has been gone through basic processing, as an essential data set for subsequent differential testing and multiple analysis. When you want to get raw data, just run ms_data.tl.reset_raw_data().
ms_data.tl.raw_checkpoint()
print(ms_data.tl.raw)

print(ms_data.tl.raw)

print(ms_data)

#Normalization
ms_data.tl.normalize_total(target_sum=10000)
ms_data.tl.log1p()

#Highly variable genes
ms_data.tl.highly_variable_genes(
        min_mean=0.0125,
        max_mean=3,
        min_disp=0.5,
        n_top_genes=2000,
        res_key='highly_variable_genes',
        scope=slice_generator[:],
        mode='integrate'
        )

# remember to choose a res_key when plot
ms_data.plt.highly_variable_genes(res_key='highly_variable_genes')

#Scale each gene to unit variance. Clip values exceeding standard deviation 10. If data.tl.scale(zero_center=False) is used, sparse matrix will be used for calculation, which can greatly reduce the memory required for running.
ms_data.tl.scale(max_value=10, zero_center=True)

ms_data.tl.pca(
        use_highly_genes=False,
        n_pcs=30,
        res_key='pca',
        scope=slice_generator[:],
        mode='integrate'
        )
ms_data.tl.neighbors(scope=slice_generator[:],mode='integrate', pca_res_key='pca', n_pcs=30, res_key='neighbors', method='rapids')
ms_data.tl.umap(scope=slice_generator[:],mode='integrate', pca_res_key='pca', neighbors_res_key='neighbors', res_key='umap', method='rapids')

#Clustering, currently three common clustering methods, including Leiden, Louvain and Phenograph
ms_data.tl.leiden(
            neighbors_res_key='neighbors',
            res_key='leiden', resolution=1.0,
            scope=slice_generator[:],
            mode='integrate', method='rapids'
            )

#Show the spatial distribution of UMAP and Leiden clustering.
ofp = os.path.join(out_dir, '001_{}_UMAP_resolution1.pdf'.format(projectx))
ms_data.plt.umap(
            cluster_key='leiden',
            res_key='umap',
            scope=slice_generator[:],
            mode='integrate',
            out_path=ofp, out_dpi=300
            )

ms_data.plt.cluster_scatter(res_key='leiden')

ofp = os.path.join(out_dir, '001_{}_cluster_scatter_resolution1.pdf'.format(projectx))
ms_data.plt.cluster_scatter(
            res_key='leiden',
            scope=slice_generator[:],
            mode='integrate',
            plotting_scale_width=10,          # the width of scale
            reorganize_coordinate=3,          # the number of plots in each row
            horizontal_offset_additional=20,  # adjustment for horizontal distance
            vertical_offset_additional=20     # adjustment for vertical distance
            ,out_path=ofp, out_dpi=300
            )

#Output file
msh5ms_fp = os.path.join(processed_data_odir, '{}_240108_3integrate_msdata_beforeBatchQC.h5ms'.format(projectx))
st.io.write_h5ms(ms_data,output=msh5ms_fp)
print("Export {} to {}".format(ms_data, msh5ms_fp))

msh5ms_fp = os.path.join(processed_data_odir, '{}_240108_3integrate_msdata_beforeBatchQC.h5ms'.format(projectx))
rd_msd = st.io.read_h5ms(msh5ms_fp)
print(rd_msd)
Zjianglin commented 6 months ago

Hi, another problem occurs here too! After I run the laden and louvain clustering, write_h5ms. I encountered another Error in read_h5ms procedure. here is my running codes.


ofp = os.path.join(out_dir, "{}_002_QCed_msdata_{}_laiden_cluster_scatter_res{}.pdf".format(projectx, normafun, resolution))
print(ofp)
ms_data.plt.cluster_scatter(
            res_key='leiden',
            scope=slice_generator[:],
            mode='integrate',
            plotting_scale_width=10,          # the width of scale
            reorganize_coordinate=3,          # the number of plots in each row
            horizontal_offset_additional=20,  # adjustment for horizontal distance
            vertical_offset_additional=20     # adjustment for vertical distance
            ,out_path=ofp, out_dpi=300
            )

ofp = os.path.join(out_dir, "{}_002_QCed_msdata_{}_louvain_cluster_scatter_res{}.pdf".format(projectx, normafun, resolution))
print(ofp)
ms_data.plt.cluster_scatter(
            res_key='louvain',
            scope=slice_generator[:],
            mode='integrate',
            plotting_scale_width=10,          # the width of scale
            reorganize_coordinate=3,          # the number of plots in each row
            horizontal_offset_additional=20,  # adjustment for horizontal distance
            vertical_offset_additional=20     # adjustment for vertical distance
            ,out_path=ofp, out_dpi=300
            )
ofp = os.path.join(out_dir, "{}_002_QCed_msdata_{}_laiden_cluster_res{}_markers.csv".format(projectx, normafun, resolution))
print(ofp)
ms_data.tl.find_marker_genes(cluster_res_key='leiden', use_highly_genes=False, use_raw =False,
                            res_key='leiden_cluster_marker_genes', output=ofp, n_jobs=24)
print('Save leiden_cluster_marker_genes to {}\n'.format(ofp))

ofp = os.path.join(out_dir, "{}_002_QCed_msdata_{}_laiden_cluster_res{}_marker_genes_heatmap.pdf".format(projectx, normafun, resolution))
print(ofp)
ms_data.plt.marker_genes_heatmap(res_key='leiden_cluster_marker_genes', markers_num=6,out_path=ofp, out_dpi=300)

ofp = os.path.join(out_dir, "{}_002_QCed_msdata_{}_louvain_cluster_res{}_markers.csv".format(projectx, normafun, resolution))
print(ofp)
ms_data.tl.find_marker_genes(cluster_res_key='louvain', use_highly_genes=False, use_raw =False,
                            res_key='louvain_cluster_marker_genes', output=ofp, n_jobs=24)
print('Save leiden_cluster_marker_genes to {}\n'.format(ofp))

ofp = os.path.join(out_dir, "{}_002_QCed_msdata_{}_louvain_cluster_res{}_marker_genes_heatmap.pdf".format(projectx, normafun, resolution))
print(ofp)
ms_data.plt.marker_genes_heatmap(res_key='louvain_cluster_marker_genes', markers_num=6,out_path=ofp, out_dpi=300)

#Output file
msh5ms_fp = os.path.join(processed_data_odir, '{}_240108_3integrate_msdata_beforeBatchQC.h5ms'.format(projectx))
st.io.write_h5ms(ms_data,output=msh5ms_fp)
print("Export {} to {}".format(ms_data, msh5ms_fp))
##
xport ms_data: {'Low_MM1496': (69337, 28519), 'Mid_LL1084': (32392, 28127), 'High_CF142': (54044, 32377)}
num_slice: 3
names: ['Low_MM1496', 'Mid_LL1084', 'High_CF142']
obs: ['batch', 'total_counts', 'n_genes_by_counts', 'pct_counts_mt']
var: ['n_cells', 'n_counts', 'mean_umi', 'hvgs']
relationship: other
var_type: intersect to 24403
mss: ["scope_[0,1,2]:['highly_variable_genes', 'pca', 'pca_variance_ratio', 'neighbors', 'umap', 'leiden', 'gene_exp_leiden', 'louvain', 'gene_exp_louvain', 'leiden_cluster_marker_genes', 'louvain_cluster_marker_genes']"]
 to XXX

above codes run normally.

However, when i try to reload the exported ms_data, it crushed!

msh5ms_fp = os.path.join(processed_data_odir, '{}_240108_3integrate_msdata_beforeBatchQC.h5ms'.format(projectx))
rd_msd = st.io.read_h5ms(msh5ms_fp)
print(rd_msd)

## output
[2024-01-16 19:14:17][Stereo][88627][MainThread][23319675127616][reader][434][WARNING]: obs not in rules, did not read from h5ms
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In[30], line 2
      1 msh5ms_fp = os.path.join(processed_data_odir, '{}_240108_3integrate_msdata_beforeBatchQC.h5ms'.format(projectx))
----> 2 rd_msd = st.io.read_h5ms(msh5ms_fp)
      3 print(rd_msd)

File ~/miniforge3/envs/stomicsPy38/lib/python3.8/site-packages/stereo/utils/read_write_utils.py:37, in ReadWriteUtils.check_file_exists.<locals>.wrapped(*args, **kwargs)
     35     else:
     36         raise FileNotFoundError("Please ensure there is a file")
---> 37 return func(*args, **kwargs)

File ~/miniforge3/envs/stomicsPy38/lib/python3.8/site-packages/stereo/io/reader.py:419, in read_h5ms(file_path, use_raw, use_result)
    417 elif k == 'sample_merged':
    418     merged_data = StereoExpData()
--> 419     merged_data = _read_stereo_h5ad_from_group(f[k], merged_data, use_raw, use_result)  # noqa
    420 elif k == 'names':
    421     names = h5ad.read_dataset(f[k])

File ~/miniforge3/envs/stomicsPy38/lib/python3.8/site-packages/stereo/io/reader.py:293, in _read_stereo_h5ad_from_group(f, data, use_raw, use_result, bin_type, bin_size)
    291 if use_result is True and 'key_record' in f.keys():
    292     h5ad.read_key_record(f['key_record'], data.tl.key_record)
--> 293     _read_stereo_h5_result(data.tl.key_record, data, f)
    294 return data

File ~/miniforge3/envs/stomicsPy38/lib/python3.8/site-packages/stereo/io/reader.py:349, in _read_stereo_h5_result(key_record, data, f)
    347 if analysis_key == 'marker_genes':
    348     clusters = h5ad.read_dataset(f[f'clusters_record@{res_key}@marker_genes'])
--> 349     data.tl.result[res_key] = {}
    350     for cluster in clusters:
    351         cluster_key = f'{cluster}@{res_key}@marker_genes'

File ~/miniforge3/envs/stomicsPy38/lib/python3.8/site-packages/stereo/core/result.py:165, in Result.__setitem__(self, key, value)
    163 for name_type, name_dict in Result.TYPE_NAMES_DICT.items():
    164     for like_name in name_dict:
--> 165         if not key.startswith('gene_exp_') and like_name in key and self._real_set_item(name_type, key, value):
    166             return
    167 if type(value) is pd.DataFrame:

File ~/miniforge3/envs/stomicsPy38/lib/python3.8/site-packages/stereo/core/result.py:143, in Result._real_set_item(self, type, key, value)
    141 def _real_set_item(self, type, key, value):
    142     if type == Result.CLUSTER:
--> 143         self._set_cluster_res(key, value)
    144     elif type == Result.CONNECTIVITY:
    145         self._set_connectivities_res(key, value)

File ~/miniforge3/envs/stomicsPy38/lib/python3.8/site-packages/stereo/core/result.py:188, in Result._set_cluster_res(self, key, value)
    187 def _set_cluster_res(self, key, value):
--> 188     assert type(value) is pd.DataFrame and 'group' in value.columns.values, "this is not cluster res"
    189     warn(
    190         f'FutureWarning: {key} will be moved from `StereoExpData.tl.result` to `StereoExpData.cells` in the '
    191         'future, make sure your code set the property correctly. ',
    192         category=FutureWarning
    193     )
    194     self.__stereo_exp_data.cells._obs[key] = value['group'].values

AssertionError: this is not cluster res

Did I do something wrong? Or how could I figure it out? Thanks

Zhenbin24 commented 4 months ago

Hello, I have no problem executing your script. You can try to install the latest stereopy version (V1.1.0)

image

image

` low_mm1496_geffp = os.path.join(wkdir, 'Embyro_E9.5.h5ad') mid_ll1084_geffp = os.path.join(wkdir, 'Embyro_E10.5.h5ad') high_cf142_geffp = os.path.join(wkdir, "Embyro_E11.5.h5ad")

data_dt = {"Low_MM1496": st.io.read_h5ad(low_mm1496_geffp), "Mid_LL1084": st.io.read_h5ad(mid_ll1084_geffp), "High_CF142": st.io.read_h5ad(high_cf142_geffp) } `

test data url:http://116.6.21.110:8090/share/dd965cba-7c1f-40b2-a275-0150890e005f

image

Zjianglin commented 4 months ago

Hi @Zhenbin24 , After adding some other samples, I encountered this problem again. Here is my write_h5ms function output logs:

...
obs: ['batch', 'total_counts', 'n_genes_by_counts', 'pct_counts_mt']
var: ['n_cells', 'n_counts', 'mean_umi', 'hvgs']
relationship: other
var_type: intersect to 22646
mss: ["scope_[0,1,2,3,4,5,6]:['highly_variable_genes', 'pca', 'pca_variance_ratio', 'neighbors', 'umap', 'leiden', 'gene_exp_leiden', 'louvain', 'gene_exp_louvain', 'leiden_cluster_marker_genes', 'louvain_cluster_marker_genes']"]
 to XXX

Did you simutaneously run the louvain as well as leiden clustering? I did this, and It can not be imported again.

LILI-0000-0002-8173-7367 commented 2 months ago

hello, @Zjianglin .

Did you solve the st.io.read_h5ms issue? Thank you.