STOmics / Stereopy

A toolkit of spatial transcriptomic analysis.
MIT License
184 stars 64 forks source link

find_marker_genes throws error using case_groups and control_groups #200

Closed ChiragNepal closed 10 months ago

ChiragNepal commented 10 months ago

I have merged two data objects data1 and data2 into dataM

dataM = st.utils.data_helper.merge(data1, data2) dataM.tl.raw_checkpoint() dataM.tl.normalize_total() dataM.tl.log1p() dataM.tl.pca(use_highly_genes=False, n_pcs=30, res_key='pca', svd_solver='arpack' ) dataM.tl.batches_integrate(pca_res_key='pca', res_key='pca_integrated')

Integrating of two samples

dataM.tl.neighbors(pca_res_key='pca_integrated', n_pcs=30, res_key='neighbors_integrated', n_jobs= 8 ) dataM.tl.umap(pca_res_key='pca_integrated', neighbors_res_key='neighbors_integrated', res_key='umap_integrated') dataM.plt.batches_umap(res_key='umap_integrated') dataM.tl.leiden(neighbors_res_key='neighbors_integrated', res_key='leiden', resolution=0.75) dataM.plt.cluster_scatter(res_key='leiden')

Update dataM dictionary such that dataM.cells['batch'] and dataM.cells['leiden'] are combined

dataM.cells['batch_leiden_combination'] = dataM.cells['batch'].astype(str) + ':' + dataM.cells['leiden'].astype(str)

dataM StereoExpData object with n_cells X n_genes = 103076 X 12451 bin_type: bins bin_size: 50 offset_x = 2841 offset_y = 4296 cells: ['cell_name', 'batch', 'total_counts', 'pct_counts_mt', 'n_genes_by_counts', 'leiden', 'batch_leiden_combination'] genes: ['gene_name'] cells_matrix = ['pca', 'pca_integrated', 'umap_integrated'] cells_pairwise = ['neighbors_integrated'] key_record: {'pca': ['pca', 'pca_integrated'], 'neighbors': ['neighbors_integrated'], 'umap': ['umap_integrated'], 'cluster': ['leiden'], 'marker_genes': ['marker_genes'], 'gene_exp_cluster': ['gene_exp_leiden']}

unique_batch_leiden = dataM.cells['batch_leiden_combination'].unique() print(unique_batch_leiden) ['0:3' '0:1' '0:2' '0:8' '0:5' '0:7' '0:14' '0:19' '0:9' '0:6' '0:4' '0:12' '0:10' '0:13' '0:11' '0:15' '0:18' '0:16' '0:17' '1:2' '1:4' '1:12' '1:9' '1:10' '1:5' '1:3' '1:1' '1:7' '1:6' '1:11' '1:8' '1:13' '1:14' '1:15' '1:16' '1:18' '1:17' '1:19']

I want to run Find_marker_genes between 0:0 and 1:0. Basically same clusters across two groups

Find DE Marker genes across all cluster

dataM.tl.find_marker_genes( cluster_res_key='leiden', method='t_test', use_highly_genes=False, use_raw=True, res_key='marker_genes' )

[2023-10-17 11:40:36][Stereo][4136646][MainThread][140077036418880][st_pipeline][37][INFO]: start to run find_marker_genes... [2023-10-17 11:40:36][Stereo][4136646][MainThread][140077036418880][tool_base][119][INFO]: read group information, grouping by group column. [2023-10-17 11:40:36][Stereo][4136646][MainThread][140077036418880][tool_base][157][INFO]: start to run... [2023-10-17 11:40:36][Stereo][4136646][MainThread][140077036418880][time_consume][55][INFO]: start to run calc_pct_and_pct_rest... [2023-10-17 11:41:55][Stereo][4136646][MainThread][140077036418880][tool_base][159][INFO]: end to run. [2023-10-17 11:41:55][Stereo][4136646][MainThread][140077036418880][st_pipeline][40][INFO]: find_marker_genes end, consume time 79.5135s.

It gives the expected result.

Now, let's re-run find_marker_genes by using case_groups and control_groups .

**I want to do specifically with my groups of interest by using case_groups and control_groups** case_groups (Union[str, ndarray, list]) – case group, default all clusters. control_groups (Union[str, ndarray, list]) – control group, default the rest of groups.

Now I changed cluster_res_key='batch_leiden_combination' since batch_leiden info is stored there

dataM.tl.find_marker_genes( cluster_res_key='batch_leiden_combination', method='t_test', use_highly_genes=False, use_raw=True, res_key='marker_genes', case_groups=['0:0'], control_groups=['0:1'] )

[2023-10-17 11:47:21][Stereo][4136646][MainThread][140077036418880][st_pipeline][37][INFO]: start to run find_marker_genes... Traceback (most recent call last): File "/home/genomics/miniconda3/envs/st/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3802, in get_loc return self._engine.get_loc(casted_key) File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 165, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 5745, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 5753, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 'group'

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "", line 1, in File "/home/genomics/miniconda3/envs/st/lib/python3.8/site-packages/stereo/core/st_pipeline.py", line 39, in wrapped res = func(*args, **kwargs) File "/home/genomics/miniconda3/envs/st/lib/python3.8/site-packages/stereo/core/st_pipeline.py", line 909, in find_marker_genes if self.result[cluster_res_key]['group'].unique().size <= 1: File "/home/genomics/miniconda3/envs/st/lib/python3.8/site-packages/pandas/core/series.py", line 981, in getitem return self._get_value(key) File "/home/genomics/miniconda3/envs/st/lib/python3.8/site-packages/pandas/core/series.py", line 1089, in _get_value loc = self.index.get_loc(label) File "/home/genomics/miniconda3/envs/st/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3804, in get_loc raise KeyError(key) from err KeyError: 'group'

Can anyone suggest how to use case_group and control_groups in specific batch_leiden of interest

qiupinghust commented 10 months ago

I found a problem in your script, the ’0:0' and '1:0' group are not in the unique_batch_leiden. And what version of stereopy are you using? I haven't seen the code "if self.result[cluster_res_key]['group'].unique().size <= 1" in the line 909 of st_pipeline.py int the lastest version.

limin321 commented 10 months ago

I think you put the wrong case_groups. The way you put case_groups=['0:0'], control_groups=['0:1'] indicating that you want to compare cluster0 and cluster1 of case_groupes (assuming case is 0, control is 1). However, when you print unique values in the 'batch_leiden_combination', you don't have cluster 0.

You should put the following ways: dataM.tl.find_marker_genes( cluster_res_key='batch_leiden_combination', method='t_test', use_highly_genes=False, use_raw=True, res_key='marker_genes', case_groups=['0:1'], control_groups=['1:1'] ) or dataM.tl.find_marker_genes( cluster_res_key='batch_leiden_combination', method='t_test', use_highly_genes=False, use_raw=True, res_key='marker_genes', case_groups=['0:1'], control_groups=['0:2'] )

Hope this helps.

ChiragNepal commented 10 months ago

I updated the command to match case_groups and control_groups . Still the same error. I will update Stereopy to latest version v14.

dataM.tl.find_marker_genes( cluster_res_key='batch_leiden_combination', method='t_test', use_highly_genes=False, use_raw=True, res_key='marker_genes', case_groups=['0:1'], control_groups=['1:1'] )

[2023-10-17 21:20:45][Stereo][4136646][MainThread][140077036418880][st_pipeline][37][INFO]: start to run find_marker_genes... Traceback (most recent call last): File "/home/genomics/miniconda3/envs/st/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3802, in get_loc return self._engine.get_loc(casted_key) File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 165, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 5745, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 5753, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 'group'

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "", line 1, in File "/home/genomics/miniconda3/envs/st/lib/python3.8/site-packages/stereo/core/st_pipeline.py", line 39, in wrapped res = func(*args, **kwargs) File "/home/genomics/miniconda3/envs/st/lib/python3.8/site-packages/stereo/core/st_pipeline.py", line 909, in find_marker_genes if self.result[cluster_res_key]['group'].unique().size <= 1: File "/home/genomics/miniconda3/envs/st/lib/python3.8/site-packages/pandas/core/series.py", line 981, in getitem return self._get_value(key) File "/home/genomics/miniconda3/envs/st/lib/python3.8/site-packages/pandas/core/series.py", line 1089, in _get_value loc = self.index.get_loc(label) File "/home/genomics/miniconda3/envs/st/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3804, in get_loc raise KeyError(key) from err KeyError: 'group'

limin321 commented 10 months ago

Try version stereopy0.14.0b1, I tested with no error of the 2 codes I provided.

ChiragNepal commented 10 months ago

Try version stereopy0.14.0b1, I tested with no error of the 2 codes I provided.

Thank you for your comment. I checked that my version is stereopy 0.12.0

So I upgraded: pip install stereopy Collecting stereopy Using cached stereopy-0.12.1.tar.gz (7.4 MB) [Not the latest one]

The version 0.14 is only on GitHub

git clone -b dev https://github.com/STOmics/stereopy.git cd stereopy python setup.py install

ERROR: Could not find a version that satisfies the requirement gefpy>=0.6.24 (from stereopy) (from versions: none) ERROR: No matching distribution found for gefpy>=0.6.24 WARNING: There was an error checking the latest version of pip.

(base)

pip install gefpy

ERROR: Could not find a version that satisfies the requirement gefpy (from versions: none) ERROR: No matching distribution found for gefpy WARNING: There was an error checking the latest version of pip. (base)

(base) pip install gefpy==0.7.7 ERROR: Could not find a version that satisfies the requirement gefpy==0.7.7 (from versions: none) ERROR: No matching distribution found for gefpy==0.7.7

Any suggestion on how to fix this error.

limin321 commented 10 months ago

Try version stereopy0.14.0b1, I tested with no error of the 2 codes I provided.

Thank you for your comment. I checked that my version is stereopy 0.12.0

So I upgraded: pip install stereopy Collecting stereopy Using cached stereopy-0.12.1.tar.gz (7.4 MB) [Not the latest one]

The version 0.14 is only on GitHub

git clone -b dev https://github.com/STOmics/stereopy.git cd stereopy python setup.py install

ERROR: Could not find a version that satisfies the requirement gefpy>=0.6.24 (from stereopy) (from versions: none) ERROR: No matching distribution found for gefpy>=0.6.24 WARNING: There was an error checking the latest version of pip.

(base)

pip install gefpy

ERROR: Could not find a version that satisfies the requirement gefpy (from versions: none) ERROR: No matching distribution found for gefpy WARNING: There was an error checking the latest version of pip. (base)

(base) pip install gefpy==0.7.7 ERROR: Could not find a version that satisfies the requirement gefpy==0.7.7 (from versions: none) ERROR: No matching distribution found for gefpy==0.7.7

Any suggestion on how to fix this error.

Make sure you are on Linux. After you create a clean conda env, install it with it this code: pip install stereopy==0.14.0b1

ChiragNepal commented 10 months ago

Installation of the latest stereopy version 0.14.0b1 solved the case_groups and control_group issue. It also solved the installation problem with gefpy. Thanks for the suggestion !

Here is how I installed the latest version, if anybody needs for reference conda create --name condav1 python=3.8 conda activate condav1 pip install stereopy==0.14.0b1