Aggregation of expression data ends up in matrix filled with NaN's

zvittorio commented 9 months ago

Dear SEACell team,

first of all thank you for such an interesting and versatile tool. I have been recently using it for creating metacells from a scRNA-seq dataset with cells coming from different studies and in turn from different patients. I wanted to try to repeat the workflow shown for the COVID dataset integration, but I am still at the first round of metacells. I am running the basic pipeline shown in notebooks/SEACell_computation.ipynb iteratively across the samples, and I am using the soft assignment for binning the cells. Everything seems to run smoothly, except for some samples which have no apparent difference (in data) from the other ones. In those cases, the expression matrix of the metacell (X slot) is completely filled with NaN's, even though the X slot of the starting anndata object, the anndata layer used for aggregation, and the X_pca are not. This is an example:

adata.X.toarray() :
adata.layers['norm_counts'].toarray() :
adata.obsm['X_pca'] :
whereas this the output of metacell.X.toarray() :

I have also inspected the figures produced in the workflow, but none of them looks abnormal based on my understanding (should I pay attention to one of them specifically in this case? If so, what should I look at?)

Finally, this is the code I have used for producing the metacell object:

for sample in rerun_these :
    print("Analyzing", sample)
    ad_tmp = adata_big[adata_big.obs['Sample'] == sample].copy()

    n_SEACells = ceil(ad_tmp.n_obs / 75)

    # renormalize 
    ad_tmp.X = ad_tmp.layers['counts'].copy()
    sc.pp.normalize_total(ad_tmp, target_sum=1e4)
    ad_tmp.layers['norm_counts'] = ad_tmp.X.copy()

    # rerun pca
    sc.pp.log1p(ad_tmp)
    sc.pp.pca(ad_tmp, n_comps=50)

    model = SEACells.core.SEACells(ad_tmp, 
                  build_kernel_on=build_kernel_on, 
                  n_SEACells= n_SEACells , 
                  n_waypoint_eigs=n_waypoint_eigs,
                  convergence_epsilon = 1e-5)

    model.construct_kernel_matrix()
    M = model.kernel_matrix

    model.initialize_archetypes()

    model.fit(min_iter=10, max_iter=1000)

    SEACell_soft_ad = SEACells.core.summarize_by_soft_SEACell(ad_tmp, model.A_, celltype_label='celltype_col',summarize_layer='norm_counts', minimum_weight=0.05)    

    rerun_dict[sample] = SEACell_soft_ad

Thank you for any help or suggestions you can provide!

Vittorio

anndata     0.7.6
scanpy      1.8.1
sinfo       0.3.4
-----
PIL                 8.2.0
SEACells            NA
backcall            0.2.0
bottleneck          1.3.2
cairo               1.20.1
cffi                1.14.5
colorama            0.4.4
cycler              0.10.0
cython_runtime      NA
dateutil            2.8.1
debugpy             1.3.0
decorator           5.0.7
fcsparser           0.2.3
h5py                3.2.1
igraph              0.9.6
ipykernel           6.0.0
ipython_genutils    0.2.0
ipywidgets          7.6.3
jedi                0.18.0
joblib              1.0.1
kiwisolver          1.3.1
leidenalg           0.8.7
llvmlite            0.36.0
loompy              3.0.7
matplotlib          3.4.2
matplotlib_inline   NA
mpl_toolkits        NA
natsort             7.1.1
ncls                0.0.67
netifaces           0.10.9
networkx            2.5.1
numba               0.53.1
numexpr             2.7.3
numpy               1.20.3
numpy_groupies      0.9.14
packaging           20.9
palantir            1.2
pandas              1.2.4
parso               0.8.2
pexpect             4.8.0
phenograph          1.5.7
pickleshare         0.7.5
pkg_resources       NA
progressbar         4.2.0
prompt_toolkit      3.0.19
psutil              5.8.0
ptyprocess          0.7.0
pycparser           2.20
pyexpat             NA
pygam               0.8.0
pygments            2.9.0
pynndescent         0.5.4
pyparsing           2.4.7
pyranges            0.0.110
pyrle               0.0.33
python_utils        NA
pytoml              NA
pytz                2021.1
scipy               1.6.3
seaborn             0.11.2
setuptools_scm      NA
simplejson          3.17.2
sitecustomize       NA
six                 1.16.0
sklearn             0.24.2
sorted_nearest      0.0.32
sphinxcontrib       NA
statsmodels         0.12.2
storemagic          NA
tables              3.6.1
tabulate            0.8.9
texttable           1.6.4
tornado             6.1
tqdm                4.61.2
traitlets           5.0.5
typing_extensions   NA
umap                0.5.1
wcwidth             0.2.5
zmq                 22.1.0
-----
IPython             7.25.0
jupyter_client      6.1.12
jupyter_core        4.7.1
notebook            6.4.0
-----
Python 3.9.5 (default, Dec 21 2022, 10:33:37)

sitarapersad commented 7 months ago

Can you double check for me what the output of SEACell_ad = SEACells.core.summarize_by_SEACell(ad, SEACells_label='SEACell', summarize_layer='raw') gives you? Thanks!

Gwennerd commented 4 months ago

Hi, I am running into the same problem with running SEACells, I want to use the seacells soft assignment, but indeed get the NaN output even though the data looks normal to me.

Gwennerd commented 4 months ago

Can you double check for me what the output of SEACell_ad = SEACells.core.summarize_by_SEACell(ad, SEACells_label='SEACell', summarize_layer='raw') gives you? Thanks!

With my code, the result you requested with: SEACell_ad = SEACells.core.summarize_by_SEACell(ad, SEACells_label='SEACell', summarize_layer='raw')

looks like this:

error_Nan

Hopefully this gives you the information you need to help me out.

Kind regards, Gwen

GLking123 commented 2 months ago

Hi, I am running into the same problem with running SEACells, I want to use the seacells soft assignment, but indeed get the NaN output even though the data looks normal to me.

Hello, I had the same problem recently, did you solve it? Is there any good way, thanks for the reply.

Gwennerd commented 2 months ago

Hi, I am running into the same problem with running SEACells, I want to use the seacells soft assignment, but indeed get the NaN output even though the data looks normal to me.

Hello, I had the same problem recently, did you solve it? Is there any good way, thanks for the reply.

Sadly not

dpeerlab / SEACells

Aggregation of expression data ends up in matrix filled with NaN's #57