cc.tl.ClusterAutoK always fails

Report

Hi developers, I'm trying to run cellcharter on my own Stereo-Seq spatial transcriptomics data. I have seven samples and tried to run cellcharter following the Spatial clustering of Nanostring CosMx spatial transcriptomics data.

However, it always raise a The kernel appears to have died. It will restart automatically. error. I have tried use a subset samples (reduced cell numbers from 68w+ to 37w+) and use CPU instead of GPU, but the error still occurs. Could you please help me figure it out? Thanks.

Here is my running codes:

import os
import torch
import anndata as ad
import squidpy as sq
import cellcharter as cc
import pandas as pd
import scanpy as sc
import scvi
import numpy as np
import matplotlib.pyplot as plt

device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

print('torch version is {}, device is {}'.format(torch.__version__, device))
print('scanpy version is {}, anndata version is {}'.format(sc.__version__, ad.__version__))
print('squidpy version is {}'.format(sq.__version__))
print('scvi version is {}'.format(scvi.__version__))
print('cellcharter version is {}'.format(cc.__version__))

#Output
#torch version is 1.12.1, device is cuda:0
#scanpy version is 1.10.1, anndata version is 0.10.7
#squidpy version is 1.4.1
#scvi version is 0.20.3
#cellcharter version is 0.2.0

adata_list = []
#samplex = 'MedianLL1084'
#fpx = sample_fps_li[samplex]

# for samplex in sample_vec: 
for samplex in ['WT',  'LowMM1496', 'MedianLL1084', 'Median1362']:
    fpx = sample_fps_li[samplex]
    adata = sc.read_h5ad(fpx)
    print("Read {} frm {}:\n{}\n".format(samplex, fpx, adata))

    adata.obs['sample'] = samplex
    adata_list.append(adata)

adata = ad.concat(adata_list, axis=0, merge='same', pairwise=True, index_unique='_')
adata.obs['sample'] = pd.Categorical(adata.obs['sample'])
adata

sc.pp.filter_genes(adata, min_counts=3)
sc.pp.filter_cells(adata, min_counts=150)
sc.pp.filter_cells(adata, max_counts=4000)
sc.pp.filter_cells(adata, min_genes=100)
sc.pp.filter_cells(adata, max_genes=1500)

adata.layers["counts"] = adata.X.copy()
sc.pp.normalize_total(adata, target_sum=1e6)
sc.pp.log1p(adata)
print(adata)

scvi.settings.seed = 12345
scvi.model.SCVI.setup_anndata(
    adata, 
    layer="counts", 
    batch_key='sample'
)

model = scvi.model.SCVI(adata)
model.train(early_stopping=True, enable_progress_bar=True)
adata.obsm['X_scVI'] = model.get_latent_representation(adata).astype(np.float32)

"""
Global seed set to 12345
/home/ncba04/miniforge3/envs/cellcharter-env/lib/python3.10/abc.py:119: FutureWarning: SparseDataset is deprecated and will be removed in late 2024. It has been replaced by the public classes CSRDataset and CSCDataset.

For instance checks, use `isinstance(X, (anndata.experimental.CSRDataset, anndata.experimental.CSCDataset))` instead.

For creation, use `anndata.experimental.sparse_dataset(X)` instead.

  return _abc_instancecheck(cls, instance)
Epoch 1/22:   0%|          | 0/22 [00:00<?, ?it/s]
/home/ncba04/miniforge3/envs/cellcharter-env/lib/python3.10/abc.py:119: FutureWarning: SparseDataset is deprecated and will be removed in late 2024. It has been replaced by the public classes CSRDataset and CSCDataset.

For instance checks, use `isinstance(X, (anndata.experimental.CSRDataset, anndata.experimental.CSCDataset))` instead.

For creation, use `anndata.experimental.sparse_dataset(X)` instead.

  return _abc_instancecheck(cls, instance)
Epoch 22/22: 100%|██████████| 22/22 [11:56<00:00, 32.57s/it, loss=1.74e+03, v_num=1]

"""

sq.gr.spatial_neighbors(adata, library_key='sample', coord_type='generic', delaunay=True, spatial_key='spatial')
cc.gr.remove_long_links(adata)
cc.gr.aggregate_neighbors(adata, n_layers=3, use_rep='X_scVI', out_key='X_cellcharter', sample_key='sample')

All above codes normally executed, but always failed when running codes below:

autok = cc.tl.ClusterAutoK(
    n_clusters=(5,25), 
    max_runs=10, 
    model_params=dict(
        random_state=12345
        # If running on GPU
#         ,trainer_params=dict(accelerator='gpu', devices=1)
    )
)
autok.fit(adata, use_rep='X_cellcharter')

Here is my machine info (96 processes with 1.9TB memory):

# GPU
nvidia-smi 
Wed May 15 12:39:59 2024       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.86.01    Driver Version: 515.86.01    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100S-PCI...  Off  | 00000000:25:00.0 Off |                    0 |

Version information

session_info 1.0.0

asttokens NA colorama 0.4.6 comm 0.2.2 cython_runtime NA dateutil 2.9.0 debugpy 1.8.1 decorator 5.1.1 exceptiongroup 1.2.0 executing 2.0.1 ipykernel 6.29.3 jedi 0.19.1 packaging 24.0 parso 0.8.4 pickleshare 0.7.5 platformdirs 4.2.1 prompt_toolkit 3.0.42 psutil 5.9.8 pure_eval 0.2.2 pydev_ipython NA pydevconsole NA pydevd 2.9.5 pydevd_file_utils NA pydevd_plugins NA pydevd_tracing NA pygments 2.18.0 six 1.16.0 stack_data 0.6.2 tornado 6.4 traitlets 5.14.3 typing_extensions NA wcwidth 0.2.13 zmq 26.0.3

IPython 8.24.0 jupyter_client 8.6.1 jupyter_core 5.7.2

Python 3.10.14 | packaged by conda-forge | (main, Mar 20 2024, 12:45:18) [GCC 12.3.0] Linux-4.18.0-80.7.1.el8_0.x86_64-x86_64-with-glibc2.31

Hi @Zjianglin, thank you for the issue! I will need some more information to debug the problem.

How big is the dataset and how much RAM do you have available? Would it be possible to run the code as a script rather than a Jupyter notebook? I think it will give much more informative error messages.

Hi @marcovarrone , thanks for your reply.

the whole datasets is relative big (around 710k cells/spots from seven slices/samples). however, the error still occurred even if I only load single sample and filtered it into a AnnData object with n_obs × n_vars = 42222 × 21845. I tried SquareBin (bin50) as well as CellBin/ _in-silic segmented cells from my Stereo-seq spatial transcriptomics data, the error always occurred.

My machine is 96 CPUs, 1.9 TB RAM and a GPU Tesla V100S-PCIE-32GB. The OS is Linux n1 4.18.0-80.7.1.el8_0.x86_64, "Ubuntu 20.04.6 LTS".

$ free -h
              total        used        free      shared  buff/cache   available
Mem:          1.9Ti       864Gi       1.0Ti       4.6Gi       8.5Gi       1.1Ti
Swap:          31Gi       1.0Mi        31Gi

$ gpustat 
n1                        Thu May 16 17:28:05 2024  515.86.01
[0] Tesla V100S-PCIE-32GB | 35°C,   0 % |     0 / 32768 MB |

$ nvidia-smi 
Thu May 16 17:28:17 2024       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.86.01    Driver Version: 515.86.01    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100S-PCI...  Off  | 00000000:25:00.0 Off |                    0 |
| N/A   35C    P0    25W / 250W |      0MiB / 32768MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

I also run the codes as a script from terminal, but it still crushed at the point. Here is the last output:

Monitored metric elbo_validation did not improve in the last 45 records. Best score: 1892.535. Signaling Trainer to stop.
^M  0%|          | 0/4 [00:00<?, ?it/s]^M100%|██████████| 4/4 [00:00<00:00, 26.76it/s]^M100%|██████████| 4/4 [00:00<00:00, 26.73it/s]

AnnData object with n_obs × n_vars = 42222 × 21845
    obs: 'dnbCount', 'area', 'nCount_Spatial', 'nFeature_Spatial', 'percent.mito', 'orig.ident', 'x', 'y', 'sample', 'n_counts', 'n_genes', '_scvi_batch', '_scvi_labels'
    var: 'n_cells', 'n_counts', 'mean_umi'
    uns: 'log1p', '_scvi_uuid', '_scvi_manager_uuid', 'spatial_neighbors'
    obsm: 'spatial', 'X_scVI', 'X_cellcharter'
    layers: 'counts'
    obsp: 'spatial_connectivities', 'spatial_distances'
Iteration 1/10
^M  0%|          | 0/23 [00:00<?, ?it/s]/home/ncba04/miniforge3/envs/cellcharter-env/lib/python3.10/site-packages/pytorch_lightning/trainer/setup.py:176: PossibleUserWarning: GPU available but not used. Set `accelerator` and `devices` using `Trainer(accelerator='gpu', devices=1)`.
  rank_zero_warn(
^M  4%|▍         | 1/23 [00:07<02:52,  7.82s/it]/home/ncba04/miniforge3/envs/cellcharter-env/lib/python3.10/site-packages/pytorch_lightning/trainer/setup.py:176: PossibleUserWarning: GPU available but not used. Set `accelerator` and `devices` using `Trainer(accelerator='gpu', devices=1)`.
  rank_zero_warn(
^M  9%|▊         | 2/23 [00:17<03:11,  9.14s/it]/home/ncba04/miniforge3/envs/cellcharter-env/lib/python3.10/site-packages/pytorch_lightning/trainer/setup.py:176: PossibleUserWarning: GPU available but not used. Set `accelerator` and `devices` using `Trainer(accelerator='gpu', devices=1)`.
  rank_zero_warn(

Mmmh that's weird. Never happened before.

Try to run :

tl.Cluster instead of tl.ClusterAutoK.

That would be something like:

gmm = cc.tl.ClusterAutoK(
    n_clusters=10, 
)
gmm.fit(adata, use_rep='X_cellcharter')

Hi @marcovarrone , the tl.Cluster can normally run as your tutorial when only input a small subset of samples (~80k cells).

However, when I input five samples (~510k cells), the process was killed as before.

Here is the last output:

AnnData object with n_obs × n_vars = 513226 × 24944
    obs: 'dnbCount', 'area', 'nCount_Spatial', 'nFeature_Spatial', 'percent.mito', 'orig.ident', 'x', 'y', 'sample', 'n_counts', 'n_genes', '_scvi_batch', '_scvi_labels'
    var: 'n_counts'
    uns: 'log1p', '_scvi_uuid', '_scvi_manager_uuid', 'spatial_neighbors'
    obsm: 'spatial', 'X_scVI', 'X_cellcharter'
    layers: 'counts'
    obsp: 'spatial_connectivities', 'spatial_distances'
/home/ncba04/miniforge3/envs/cellcharter-env/lib/python3.10/site-packages/pytorch_lightning/trainer/setup.py:176: PossibleUserWarning: GPU available but not used. Set `accelerator` and `devices` using `Trainer(accelerator='gpu', devices=1)`.
  rank_zero_warn(
Epoch 1: 100%|██████████| 1/1 [00:00<00:00,  8.53it/s]
Epoch 34:   0%|          | 0/1 [00:00<?, ?it/s, nll=16.80]

Ok @Zjianglin , then I am quite sure it's a memory problem, but I clustered more than 8 million cells with less than 100 GB so there is something wrong happening here.

Some things that come to mind:

Can you monitor the memory with htop and top during the execution to make sure that this is the problem?
Do you have 3D coordinates in obsm['spatial']? In that case the Delaunay triangulation can add too many edges, so remove the Z dimension. (I doubt this is the case otherwise the memory error would happen in aggregate neighbors)
The code seems ok, but maybe I am not noticing something that makes CellCharter aggregate the neighbors base on all the genes rather than the features from scVI. Can you plot the size of adata.obsm['X_scVI'] and adata.obsm['X_cellcharter']?
Try to plot the results of the cluster 10, are they reasonable or there is something wrong?
Try to plot the tissue sample showing the links between cells from the spatial neighbors as I did here https://cellcharter.readthedocs.io/en/latest/notebooks/codex_mouse_spleen.html#cellcharter-s-spatial-clustering

Sorry if it's taking time but without access to the data and the environment it's not easy to debug.

Hi @marcovarrone ,

During running model.train(early_stopping=True, enable_progress_bar=True) for ~510k cells, the RAM usage of the process about 43.5GB. During run gmm.fit, it seems use all CPUs but the RAM seems not too big.

I have no 3D coordinates. Here is the output of spatial, the size of adata.obsm['X_scVI'] and adata.obsm['X_cellcharter']


AnnData object with n_obs × n_vars = 513226 × 24944
obs: 'dnbCount', 'area', 'nCount_Spatial', 'nFeature_Spatial', 'percent.mito', 'orig.ident', 'x', 'y', 'sample', 'n_counts', 'n_genes', '_scvi_batch', '_scvi_labels'
var: 'n_counts'
uns: 'log1p', '_scvi_uuid', '_scvi_manager_uuid', 'spatial_neighbors', 'sample_colors'
obsm: 'spatial', 'X_scVI', 'X_cellcharter'
layers: 'counts'
obsp: 'spatial_connectivities', 'spatial_distances'
adata samples: ['WT', 'LowMM1479', 'LowMM1496', 'MedianLL1084', 'Median1362']
Categories (5, object): ['LowMM1479', 'LowMM1496', 'Median1362', 'MedianLL1084', 'WT']
adata.obsm['spatial']: 
[[ 7677 10238]
[ 7676 10189]
[ 7673 10218]
...
[14438 14411]
[14419 14423]
[14601 14463]]

adata.uns['spatial_neighbors']: {'connectivities_key': 'spatial_connectivities', 'distances_key': 'spatial_distances', 'params': {'n_neighbors': 6, 'coord_type': 'generic', 'radius': 99.02019995940222, 'transform': None}}

Size of adata.obsm['X_scVI']: (513226, 10) Size of adata.obsm['X_cellcharter']: (513226, 40)

-the results of the cluster 10 for a single sample seems reasonable, so i want to run it for all samples.
-I failed to  plot the tissue sample showing the links between cells from the spatial neighbors for the error below:
```python
sq.pl.spatial_scatter(
    adata, 
    color='sample', 
    library_key='sample', 
    title=['WT', 'LowMM1479', 'LowMM1496', 'MedianLL1084', 'Median1362'],
    size=2500,
    img=None,
    connectivity_key='spatial_connectivities',
    edges_width=0.3,
    legend_loc=None,
#     crop_coord=(900000, 900000, 1700000, 1700000 ),
    library_id=['WT', 'LowMM1479', 'LowMM1496', 'MedianLL1084', 'Median1362']
    )

#==>
#...
File ~/miniforge3/envs/cellcharter-env/lib/python3.10/site-packages/squidpy/_constants/_pkg_constants.py:181, in Key.uns._sort_haystack(cls, adata, spatial_key, library_id, sub_key)
    172 @classmethod
    173 def _sort_haystack(
    174     cls,
   (...)
    178     sub_key: str | None = None,
    179 ) -> Sequence[str] | None:
    180     if spatial_key not in adata.uns:
--> 181         raise KeyError(f"Spatial key {spatial_key!r} not found in `adata.uns`.")
    182     haystack = list(adata.uns[spatial_key])
    183     if library_id is not None:

KeyError: "Spatial key 'spatial' not found in `adata.uns`."

Hi @marcovarrone , I solved this problem by resetting the CPU time of ulimit to unlimited on my server. It may due to the cpu time is limited to 1980 seconds in ulimit for the node. Thanks for your reply.

Btw, as I analyzed seven slices/samples together, I wonder if the cellcharter clustering process includes batch-correction? i.e. If I get the cellcharter clusters, could I just find DEG among the clusters on the normlaized count matrix? Or should do some other batch-correction such as Harmony or BBKNN? thanks.

Hi @Zjianglin, amazing! I will add it to the documentation in case it happens to someone else :) thank you for figuring it out

since you used

scvi.model.SCVI.setup_anndata(
    adata, 
    layer="counts", 
    batch_key='sample'
)

there was a batch correction on the samples.

Even though scVI is quite good at not removing biological variation, as a general approach I would suggest you run scVI without batch_key='sample', i.e., without batch correction, and see if there are batch effects. If there are, then you run scVI with batch effect correction.

I know it's more work but it's important to use batch effect correction only if necessary because every batch correction method brings a certain risk of removing true biological variability.

@marcovarrone Thanks for you suggestion. I will try it later.

After learning the tutorial of CosMx SRT, I have another few questions.

According to the cc.pl.autok_stability(autok) output, you said that 5 and 11 are good candidates.. I wonder why the good candidates is 11 rather than 8? It seems 8 has the second high stability.
I noticed that you used n_layers=3 in aggregate_neighbors and used default settings of SCVI.setup_anndata. Is that your recommendation for these settings or should we tried another more or less n_layers. What's the tendency of these parameters?
After clustering the spatial spots (bin50 in my stereo-seq data), I want to do a clusters DEG identification and cell type annotation. Could I just use the scanpy.tl.rank_genes_groups with groupby="cluster_cellcharter" following the codes of the tutorial, or should I do some additional preprocessings.

For the Q3, I tried to find DEG using sc.tl.rank_genes_groups(adata, groupby='cluster_cellcharter', method="wilcoxon", key_added=degkey), but it print out a warning :WARNING: It seems you use rank_genes_groups on the raw count data. Please logarithmize your data before calling rank_genes_groups. . However, the adata has been logarithmized before:

adata.layers["counts"] = adata.X.copy()
sc.pp.normalize_total(adata, target_sum=1e6)
sc.pp.log1p(adata)

The pipeline did not set raw attributes of adata, but the raw attribute does exists and its value are not be logarithmized. Is it come from the result of adata.layers["counts"] = adata.X.copy() command?

print('AnnData raw is {}'.format(adata.raw))
print('AnnData raw is {}'.format(adata.raw.shape))
# check .raw whether is log1p data
print(f"{adata.X.min()}\n{adata.X.max()}")
print('Raw X min: {}, max {}'.format(adata.raw.X.min(), adata.raw.X.max()))
#==>
AnnData raw is Raw AnnData with n_obs × n_vars = 326394 × 22646
AnnData raw is (326394, 22646)
0.0
12.908868679138859
Raw X min: 0.0, max 965.0

Did I do anything wrong?. The code is same as the beginning of this issue, according to you tutorial of CosMx for my stereo-seq bin50 data.

Thanks again. I'm looking forward to your reply.

@Zjianglin

Yes you are right, I have been changing the tutorials multiple times to make them more runnable for everyone and I clearly forgot to correct the changes in the text.
- scVI parameters: I saw that the default settings for scVI tend to work well for all cases. In more in-depth analysis I tend to fine-tune a bit scVI to get better reconstruction error, without compromising too much the KL loss. It's not common to do it and it's if you want to squeeze the most out of that, so if you don't have experience with that or you don't feel comfortable in changing the scVI parameters, then the default values should be fine.
- n_layers: it's a very important parameter, because it represents how big is the neighborhood taken into account when computing the cell niches. n_layers=0 means that the clustering will be performed on the cell features only (which means clustering based on the cell type/cell state). n_layers=1 means that the clustering will be performed on the cell features, but also based on the features of the first layer of neighbors around the cell. And so on. The higher the value of n_layers the broader will be the scope of the niches, with an extremely high value leading with a single niche across the whole sample.
Differential gene expression analysis between cell niches is a tricky concept because is not as trivial as the DEG between cell types. If you find differentially expressed genes between two niches, is it because the same cell types are different between the two niches or the two niches are simply composed of different cell types? To avoid this, you should first identify the cell types in the dataset (with the standard single-cell workflow or with CellCharter using n_layers=0), and annotate the cell types, for example using DEG. Then you can do DEF between the same cell type but that belongs to different niches. The reason why it's giving you that warning is (I think) because the function uses the raw data by default if the raw data are present. Given that you told me that you have the raw data in the AnnData object, you just need to set the parameter use_raw=False. I didn't test it, so I am not 100% sure.

It should be clear in the paper, but I repeat it here because I am starting to see this misunderstanding often: CellCharter has not been initially designed to find cell types but to find cell niches, which are areas with the same combination of cell types and cell states. You can look for cell types by running it with n_layers=0 and it could be convenient because it's very scalable, but this is not its original purpose.

Hi @marcovarrone thanks for you reply. After clustering, I tried to perform Proximity analysis using cc.gr.diff_nhood_enrichment. However, it always be stuck after some time running. I have tried different n_jobs from 15 to 48, but the problem always occur.

Here is the running output:

AnnData object with n_obs × n_vars = 326394 × 19517
    obs: 'total_counts', 'n_genes_by_counts', 'pct_counts_mt', 'orig.ident', 'x', 'y', 'spatial1', 'spatial2', 'n_counts', 'n_genes', 'sample', '_scvi_batch', '_scvi_labels', 'cluster_CC28', 'cluster_CC9'
    uns: '_scvi_manager_uuid', '_scvi_uuid', 'log1p', 'sample_colors', 'spatial', 'spatial_neighbors'
    obsm: 'X_cellcharter', 'X_scVI', 'spatial'
    layers: 'counts'
    obsp: 'spatial_connectivities', 'spatial_distances'

cc.gr.diff_nhood_enrichment(
    adata,
    cluster_key=grp_colx,
    condition_key='condition',
    library_key='sample',
    pvalues=True,
    n_jobs=36,
    n_perms=100
)

#==>
 adata.obs[cluster_key][np.repeat(np.arange(matrix.indptr.shape[0] - 1), np.diff(matrix.indptr))]
/home/ncba04/miniforge3/envs/cellcharter-env/lib/python3.10/site-packages/cellcharter/gr/_build.py:123: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
  target_clusters = np.array(adata.obs[cluster_key][matrix.indices])
/home/ncba04/miniforge3/envs/cellcharter-env/lib/python3.10/site-packages/cellcharter/gr/_build.py:125: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
  adata.obs[cluster_key][np.repeat(np.arange(matrix.indptr.shape[0] - 1), np.diff(matrix.indptr))]
  2%|▏         | 2/100 [01:03<48:39, 29.79s/it]  /home/ncba04/miniforge3/envs/cellcharter-env/lib/python3.10/site-packages/cellcharter/gr/_build.py:123: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
  target_clusters = np.array(adata.obs[cluster_key][matrix.indices])
/home/ncba04/miniforge3/envs/cellcharter-env/lib/python3.10/site-packages/cellcharter/gr/_build.py:125: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
#  ...
 target_clusters = np.array(adata.obs[cluster_key][matrix.indices])
/home/ncba04/miniforge3/envs/cellcharter-env/lib/python3.10/site-packages/cellcharter/gr/_build.py:125: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
  adata.obs[cluster_key][np.repeat(np.arange(matrix.indptr.shape[0] - 1), np.diff(matrix.indptr))]
 72%|███████▏  | 72/100 [28:22:32<11:02:06, 1418.79s/it]
---------------------------------------------------------------------------
BrokenProcessPool                         Traceback (most recent call last)
File ~/miniforge3/envs/cellcharter-env/lib/python3.10/site-packages/cellcharter/gr/_nhood.py:342, in diff_nhood_enrichment(adata, cluster_key, condition_key, condition_groups, library_key, pvalues, n_perms, n_jobs, copy, **nhood_kwargs)
    341 for future in as_completed(futures):
--> 342     expected_permuted = future.result()
    343     expected.append(expected_permuted)

File ~/miniforge3/envs/cellcharter-env/lib/python3.10/concurrent/futures/_base.py:451, in Future.result(self, timeout)
    450 elif self._state == FINISHED:
--> 451     return self.__get_result()
    453 self._condition.wait(timeout)

File ~/miniforge3/envs/cellcharter-env/lib/python3.10/concurrent/futures/_base.py:403, in Future.__get_result(self)
    402 try:
--> 403     raise self._exception
    404 finally:
    405     # Break a reference cycle with the exception in self._exception

BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.

During handling of the above exception, another exception occurred:

KeyboardInterrupt                         Traceback (most recent call last)
Cell In[8], line 4
      1 #Proximity analysis
      2 # The next step is to describe the relative arrangement of spatial clusters in the samples and 
      3 # observe whether the relative proximity between spatial clusters changes between the different condition
----> 4 cc.gr.diff_nhood_enrichment(
      5     adata,
      6     cluster_key=grp_colx,
      7     condition_key='condition',
      8     library_key='sample',
      9     pvalues=True,
     10     n_jobs=36,
     11     n_perms=100
     12 )

File ~/miniforge3/envs/cellcharter-env/lib/python3.10/site-packages/cellcharter/gr/_nhood.py:327, in diff_nhood_enrichment(adata, cluster_key, condition_key, condition_groups, library_key, pvalues, n_perms, n_jobs, copy, **nhood_kwargs)
    324 adata_perms = adata.copy()
    326 with tqdm(total=n_perms) as pbar:
--> 327     with ProcessPoolExecutor(max_workers=n_jobs) as executor:
    328         futures = []
    330         for samples_perm in samples_perms:

File ~/miniforge3/envs/cellcharter-env/lib/python3.10/concurrent/futures/_base.py:649, in Executor.__exit__(self, exc_type, exc_val, exc_tb)
    648 def __exit__(self, exc_type, exc_val, exc_tb):
--> 649     self.shutdown(wait=True)
    650     return False

File ~/miniforge3/envs/cellcharter-env/lib/python3.10/concurrent/futures/process.py:780, in ProcessPoolExecutor.shutdown(self, wait, cancel_futures)
    777         self._executor_manager_thread_wakeup.wakeup()
    779 if self._executor_manager_thread is not None and wait:
--> 780     self._executor_manager_thread.join()
    781 # To reduce the risk of opening too many files, remove references to
    782 # objects that use file descriptors.
    783 self._executor_manager_thread = None

File ~/miniforge3/envs/cellcharter-env/lib/python3.10/threading.py:1096, in Thread.join(self, timeout)
   1093     raise RuntimeError("cannot join current thread")
   1095 if timeout is None:
-> 1096     self._wait_for_tstate_lock()
   1097 else:
   1098     # the behavior of a negative timeout isn't documented, but
   1099     # historically .join(timeout=x) for x<0 has acted as if timeout=0
   1100     self._wait_for_tstate_lock(timeout=max(timeout, 0))

File ~/miniforge3/envs/cellcharter-env/lib/python3.10/threading.py:1116, in Thread._wait_for_tstate_lock(self, block, timeout)
   1113     return
   1115 try:
-> 1116     if lock.acquire(block, timeout):
   1117         lock.release()
   1118         self._stop()

KeyboardInterrupt:

The KeyboardInterrupt was done after about 24 hours running and no log change more than 10 hours.

That's also unexpected @Zjianglin . How many clusters, conditions and samples do you have? I am running gr.diff_nhood_enrichment for a dataset of 240k cells with 6 clusters and 2 conditions (one condition with 90k cells and the other one with 150k). With n_jobs=1 it takes 2.4 seconds per iterations and with n_jobs=15 takes 1.6 seconds per iteration (there is some overhead given by the parallelization unfortunately). Try with n_jobs=1 and let's see what's the speed.

Moreover, which version of CellCharter are you using?

Hi @marcovarrone I have four conditions and 7 samples/slices (one of the slices has two samples, so the total number of samples is 8 / Sample in obs ), A total of 326k bin50/cells were included for this analysis. The total cellcharter clusters were 28 based stability analysis with a second high stability.

Sample
LowMM1479       68533
HighCF142       52070
LowMM1496       51378
Median1362      37066
WT8m            30891
HighPT2203      29979
MedianLL1084    29468
WT4m            27009
Name: count, dtype: int64

sample
LowMM1479       68533
WT              57900
HighCF142       52070
LowMM1496       51378
Median1362      37066
HighPT2203      29979
MedianLL1084    29468
Name: count, dtype: int64

condition
Low       119911
High       82049
Median     66534
WT         57900
Name: count, dtype: int64

My CellCharter is 0.2.2

After setting the n_jobs=1, the function raised a Error after 100%|██████████| 100/100 [48:38<00:00, 29.19s/it]. Here is the final output:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[45], line 5
      1 print(adata)
      2 #Proximity analysis
      3 # The next step is to describe the relative arrangement of spatial clusters in the samples and 
      4 # observe whether the relative proximity between spatial clusters changes between the different condition
----> 5 cc.gr.diff_nhood_enrichment(
      6     adata,
      7     cluster_key=grp_colx,
      8     condition_key='condition',
      9     library_key='Sample',
     10     pvalues=True,
     11     n_jobs=1, #36
     12     n_perms=100
     13 )

File ~/miniforge3/envs/cellcharter-env/lib/python3.10/site-packages/cellcharter/gr/_nhood.py:346, in diff_nhood_enrichment(adata, cluster_key, condition_key, condition_groups, library_key, pvalues, n_perms, n_jobs, copy, **nhood_kwargs)
    343     expected.append(expected_permuted)
    344     pbar.update(1)
--> 346 expected = np.stack(expected, axis=0)
    347 tmp_pvalues = _empirical_pvalues(observed, expected)
    348 result[result_key]["pvalue"] = tmp_pvalues

File ~/miniforge3/envs/cellcharter-env/lib/python3.10/site-packages/numpy/core/shape_base.py:449, in stack(arrays, axis, out, dtype, casting)
    447 shapes = {arr.shape for arr in arrays}
    448 if len(shapes) != 1:
--> 449     raise ValueError('all input arrays must have the same shape')
    451 result_ndim = arrays[0].ndim + 1
    452 axis = normalize_axis_index(axis, result_ndim)

ValueError: all input arrays must have the same shape

I tried library_key='Sample' (8 samples, each condition has two samples) as well as library_key='sample' (7 samples, WT has one sample/ two sample slices on the one chip), but the error still occurs. Could you please help me?

Thanks @Zjianglin, you actually caught a bug in CellCharter!

The problem was rising whenever certain clusters were not shared between two conditions. Updating to 0.2.3 should (hopefully) resolve the problem. Let me know and thank you very much again for help testing CellCharter :)

hi, thanks for the very useful thread.

for KeyError: "Spatial key 'spatial' not found in adata.uns." add "shape=None" to sq.pl.spatial_scatter

best,

CSOgroup / cellcharter