atarashansky / SAMap

SAMap: Mapping single-cell RNA sequencing datasets from evolutionarily distant organisms.
MIT License
66 stars 19 forks source link

An error when use sm.run(pairwise=True) #85

Closed Smilenone closed 2 years ago

Smilenone commented 2 years ago

When I use sm.run(pairwise=True) for different species, I meet an error. Do you known why? The datasets I used here is very low quality, I think this is the reason since your code run well for my other data. But I would like to know the technical reason of this error, e.g. the low quality of the data leads to the limited number of neighbors....

ValueError                                Traceback (most recent call last)
~\AppData\Local\Temp\2\ipykernel_57464\3361404870.py in <module>
     33                     sams, f_maps=f_maps,
     34                 )
---> 35                 sm.run(pairwise=True)
     36 
     37                 # Calculating cell type mapping scores

D:\Users\qlshen\Tools\anacoda\envs\SAMap\lib\site-packages\samap\mapping.py in run(self, NUMITERS, NHS, crossK, N_GENE_CHUNKS, umap, ncpus, hom_edge_thr, hom_edge_mode, scale_edges_by_corr, neigh_from_keys, pairwise)
    303             scale_edges_by_corr = scale_edges_by_corr,
    304             neigh_from_keys=neigh_from_keys,
--> 305             pairwise=pairwise
    306         )
    307         samap = smap.final_sam

D:\Users\qlshen\Tools\anacoda\envs\SAMap\lib\site-packages\samap\mapping.py in run(self, NUMITERS, NHS, K, corr_mode, NCLUSTERS, scale_edges_by_corr, THR, neigh_from_keys, pairwise, ncpus)
    722                 labels.extend(q(sams[sid].adata.obs[keys[sid]]))
    723             sam4.adata.obs['tempv1.0.0.0'] = labels
--> 724             CSIMth, _ = _compute_csim(sam4, "tempv1.0.0.0")
    725             del sam4.adata.obs['tempv1.0.0.0']
    726 

D:\Users\qlshen\Tools\anacoda\envs\SAMap\lib\site-packages\samap\analysis.py in _compute_csim(sam3, key, X, prepend, n_top)
   1474     cell_scores = [valdict[k].sum() for k in valdict.keys()]
   1475     ixer = pd.Series(data=np.arange(clu.size),index=clu)
-> 1476     xc,yc = substr(list(valdict.keys()),';')
   1477     xc = xc.astype('int')
   1478     yc=ixer[yc].values

D:\Users\qlshen\Tools\anacoda\envs\SAMap\lib\site-packages\samap\utils.py in substr(x, s, ix, obj)
    112             ms.append(m)
    113             ls.append(len(m))
--> 114         ml = max(ls)
    115         for i in range(len(ms)):
    116             ms[i].extend([""] * (ml - len(ms[i])))

ValueError: max() arg is an empty sequence
atarashansky commented 2 years ago

Oh damn - that's crazy, I've never seen this before. It looks like it's failing because there are no cross-species edges linking your datasets (e.g. zero mapping). Apologies for not catching this scenario and printing a human-readable error message. Adding that to my list of things to do.

atarashansky commented 2 years ago

Should be fixed in v1.0.5 (pip install samap==1.0.5)