atarashansky / SAMap

SAMap: Mapping single-cell RNA sequencing datasets from evolutionarily distant organisms.
MIT License
64 stars 19 forks source link

how sankey_plot show more than four species #130

Open xmChen090 opened 11 months ago

xmChen090 commented 11 months ago

Hi Alec,

How can I display more than four species using a sankey_plot?

Thank you for your help !

atarashansky commented 11 months ago

If you have a SAMAP object with four species, you should be able to just pass in a list of the four species IDs into the sankey function. Alternatively, you could try making a chord plot. Could you give me more context about what you're trying to do?

xmChen090 commented 11 months ago

If you have a SAMAP object with four species, you should be able to just pass in a list of the four species IDs into the sankey function. Alternatively, you could try making a chord plot. Could you give me more context about what you're trying to do?

sankey_plot(MappingTable, align_thr=0.12, species_order = ["gas",'tro','dst',"kdc"])

I pass in a list of the four species IDs, but only “gas” “tro” “dst” display in sankey map. Actually, I want to make a sankey map of seven species, but I failed at sm.run(pairwise=True), probably because the computer is out of memory. I successfully ran through four species, but it cannot display all in sankey_plot. Or how can I modify the parameters in sankey_plot? sankey_plot seems to show only three species at most.

atarashansky commented 11 months ago

Can you display screenshot MappingTable.head() and paste it here?

dnjst commented 8 months ago

mapping_scores_example.csv I've had this issue before - here is a minimal set of mapping table scores to reproduce, hope it helps

DiracZhu1998 commented 3 months ago

I had the same issue

DiracZhu1998 commented 3 months ago

mapping_scores_example.csv I've had this issue before - here is a minimal set of mapping table scores to reproduce, hope it helps

Hi, did you solve this?

DiracZhu1998 commented 3 months ago

@dnjst @atarashansky I modified the sankey_plot function and it works, but when it comes to more than 3 species, the columns do not purely represent a single species. Some species cell types were messed with and mixed into the another species column.

As for chord plots, when it comes to several species and cell types, it's hard to read the graph if we group them based on species. It would be better to group them based on the mapping, that is homologous cell type group together.

Another way is to draw a heatmap.

import numpy as np import pandas as pd import holoviews as hv hv.extension('bokeh', logo=False) hv.output(size=100)

def sankey_plot2(M, species_order=None, align_thr=0.1, **params): """Generate a sankey plot

Parameters
----------
M: pandas.DataFrame
    Mapping table output from `get_mapping_scores` (second output).

align_thr: float, optional, default 0.1
    The alignment score threshold below which to remove cell type mappings.

species_order: list, optional, default None
    Specify the order of species (left-to-right) in the sankey plot.
    For example, `species_order=['hu','le','ms']`.

Keyword arguments
-----------------
Keyword arguments will be passed to `sankey.opts`.
"""
if species_order is not None:
    ids = np.array(species_order)
else:
    ids = np.unique([x.split('_')[0] for x in M.index])

d = M.values.copy()
d[d < align_thr] = 0
x, y = d.nonzero()
x, y = np.unique(np.sort(np.vstack((x, y)).T, axis=1), axis=0).T
values = d[x, y]
nodes = M.index.to_numpy()

node_pairs = nodes[np.vstack((x, y)).T]
sn1 = np.array([xi.split('_')[0] for xi in node_pairs[:, 0]])
sn2 = np.array([xi.split('_')[0] for xi in node_pairs[:, 1]])

filt = np.zeros_like(sn1, dtype=bool)
for i in range(len(ids) - 1):
    for j in range(i + 1, len(ids)):
        filt = np.logical_or(filt, np.logical_or(
            np.logical_and(sn1 == ids[i], sn2 == ids[j]),
            np.logical_and(sn1 == ids[j], sn2 == ids[i])
        ))

x, y, values = x[filt], y[filt], values[filt]

d = dict(zip(ids, list(np.arange(len(ids)))))
depth_map = dict(zip(nodes, [d[xi.split('_')[0]] for xi in nodes]))
data = nodes[np.vstack((x, y))].T
for i in range(data.shape[0]):
    if d[data[i, 0].split('_')[0]] > d[data[i, 1].split('_')[0]]:
        data[i, :] = data[i, ::-1]
R = pd.DataFrame(data=data, columns=['source', 'target'])
R['Value'] = values

# Adjust the order of nodes to ensure that they are placed in columns
node_sort_key = {species: i for i, species in enumerate(ids)}
R['source_order'] = R['source'].apply(lambda x: node_sort_key[x.split('_')[0]])
R['target_order'] = R['target'].apply(lambda x: node_sort_key[x.split('_')[0]])
R = R.sort_values(by=['source_order', 'target_order'])

def f(plot, element):
    plot.handles['plot'].sizing_mode = 'scale_width'
    plot.handles['plot'].x_range.start = -600
    plot.handles['plot'].x_range.end = 1500

sankey1 = hv.Sankey(R, kdims=["source", "target"], vdims="Value")

cmap = params.get('cmap', 'Colorblind')
label_position = params.get('label_position', 'right')
edge_line_width = params.get('edge_line_width', 0)
show_values = params.get('show_values', False)
node_padding = params.get('node_padding', 4)
node_alpha = params.get('node_alpha', 1)
node_width = params.get('node_width', 30)
node_sort = params.get('node_sort', True)
frame_height = params.get('frame_height', 1000)
frame_width = params.get('frame_width', 800)
bgcolor = params.get('bgcolor', 'snow')
apply_ranges = params.get('apply_ranges', True)

sankey1.opts(cmap=cmap, label_position=label_position, edge_line_width=edge_line_width, show_values=show_values,
             node_padding=node_padding, node_cmap=depth_map, node_alpha=node_alpha, node_width=node_width,
             node_sort=node_sort, frame_height=frame_height, frame_width=frame_width, bgcolor=bgcolor,
             apply_ranges=apply_ranges, hooks=[f])

return sankey1