WangPeng-Lab / scGCO

Single-cell Graph Cuts Optimization
MIT License
13 stars 4 forks source link

identify_spatial_genes seems to be stuck #7

Open rockdeme opened 1 year ago

rockdeme commented 1 year ago

I'm trying to run the scGCO pipeline and the execution seems stuck when I start running identify_spatial_genes.

Running ubuntu 20.04 on wsl2. I also needed to change some functions as some of them were deprecated.

Here is my script:

import matplotlib
import numpy as np
import pandas as pd
import scanpy as sc
import matplotlib.pyplot as plt
from scGCO import *

# to_scipy_sparse_matrix is deprecated
from networkx.convert_matrix import to_scipy_sparse_array
nx.to_scipy_sparse_matrix = to_scipy_sparse_array

# override the default scGCO function as multi-dimensional indexing in pandas is not supported anymore
def normalize_count_cellranger(data, Log=True):
    '''
    normalize count as in cellranger

    :param file: data: A dataframe of shape (m, n);
    :rtype: data shape (m, n);
    '''
    normalizing_factor = np.sum(data, axis=1) / np.median(np.sum(data, axis=1))
    data = pd.DataFrame(data.values, columns=data.columns, index=data.index)
    data = data / normalizing_factor[0]
    if Log == True:
        data = log1p(data)
    else:
        data = data
    return data

data_path = '/my/folder/'
adata = sc.datasets.visium_sge(sample_id='V1_Human_Lymph_Node')

adata.var_names_make_unique()

sc.pp.calculate_qc_metrics(adata, inplace=True)
sc.pp.filter_cells(adata, min_counts=6000)
sc.pp.filter_genes(adata, min_cells=10)

j=11
unary_scale_factor=100
label_cost=10
algorithm='expansion'

data = adata.to_df().astype(int)
locs = adata.obsm['spatial']
data_norm = normalize_count_cellranger(data)

fig, ax = plt.subplots(1, 1, figsize=(5, 5)))
ax.set_aspect('equal')
exp = data_norm.iloc[:, 0].values
cellGraph = create_graph_with_weight(locs, exp)
ax.scatter(locs[:, 0], locs[:, 1], s=1, color='black')
for i in np.arange(cellGraph.shape[0]):
    x = (locs[int(cellGraph[i, 0]), 0], locs[int(cellGraph[i, 1]), 0])
    y = (locs[int(cellGraph[i, 0]), 1], locs[int(cellGraph[i, 1]), 1])
    ax.plot(x, y, color='black', linewidth=0.5)
plt.title('CellGraph')
plt.show()

t0=time.time()
gmmDict =  gmm_model(data_norm)
print('GMM time(s): ', time.time()-t0)

t0= time.time()
result_df = identify_spatial_genes(locs, data_norm, cellGraph, gmmDict)
print('Running time: {} seconds'.format(time.time()-t0))

According to the tutorial the last step should take approximately the same time as gmm_model but it seems to be idle for hours.

Output:

> GMM time(s):  183.92914414405823
> scGCO used 8 out of 16 cores
>   0%|          | 0/8 [00:00<?, ?it/s]
gokulsrin commented 5 months ago

I am also getting exactly this issue! Not sure what's going on. Thank you for opening this issue in any case.

fengwanwan commented 2 months ago

Thanks for your opening this issue. Sorry for that. Please reinstall scGCO. pip install -U scGCO.