ShobiStassen / VIA

trajectory inference
https://pyvia.readthedocs.io/en/latest/
MIT License
86 stars 21 forks source link

Error when plotting differentiation.flow #48

Closed TJonCooper closed 8 months ago

TJonCooper commented 9 months ago

Firstly, thanks for a great package! I'm attempting to run via.plot_differentiation_flow but encountering a few issues that I'm not able to make sense of.

PyVIA version: 0.1.96

From my VIA object:

My VIA object: There are (23) terminal clusters corresponding to unique lineages {7: 'Memory B', 9: 'CD14 Mono', 10: 'CD14 Mono', 19: 'CD14 Mono', 21: 'Memory B', 23: 'transitional B', 28: 'Memory B', 30: 'CD16 Mono', 31: 'Late Eryth', 36: 'CD14 Mono', 40: 'Late Eryth', 41: 'Memory B', 42: 'Memory B', 43: 'Late Eryth', 44: 'CD14 Mono', 45: 'cDC2', 46: 'pDC', 51: 'cDC2', 52: 'CD16 Mono', 53: 'Memory B', 54: 'Late Eryth', 55: 'CD16 Mono', 56: 'pDC'}

The root index, 382 provided by the user belongs to cluster number 16 and corresponds to cell type HSC

And the plot command: via.plot_differentiation_flow(via_object=v0, marker_lineages=[9], do_log_flow=True, root_cluster_list=[16]) The error and traceback:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
File ~/anaconda3/envs/ViaEnv/lib/python3.9/site-packages/pandas/core/indexes/range.py:345, in RangeIndex.get_loc(self, key)
    344 try:
--> 345     return self._range.index(new_key)
    346 except ValueError as err:

ValueError: 0 is not in range

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
File ~/anaconda3/envs/ViaEnv/lib/python3.9/site-packages/pandas/core/groupby/generic.py:269, in SeriesGroupBy.aggregate(self, func, engine, engine_kwargs, *args, **kwargs)
    268 try:
--> 269     return self._python_agg_general(func, *args, **kwargs)
    270 except KeyError:
    271     # KeyError raised in test_groupby.test_basic is bc the func does
    272     #  a dictionary lookup on group.name, but group name is not
    273     #  pinned in _python_agg_general, only in _aggregate_named

File ~/anaconda3/envs/ViaEnv/lib/python3.9/site-packages/pandas/core/groupby/generic.py:288, in SeriesGroupBy._python_agg_general(self, func, *args, **kwargs)
    287 obj = self._obj_with_exclusions
--> 288 result = self.grouper.agg_series(obj, f)
    289 res = obj._constructor(result, name=obj.name)

File ~/anaconda3/envs/ViaEnv/lib/python3.9/site-packages/pandas/core/groupby/ops.py:994, in BaseGrouper.agg_series(self, obj, func, preserve_dtype)
    992     preserve_dtype = True
--> 994 result = self._aggregate_series_pure_python(obj, func)
    996 npvalues = lib.maybe_convert_objects(result, try_float=False)

File ~/anaconda3/envs/ViaEnv/lib/python3.9/site-packages/pandas/core/groupby/ops.py:1015, in BaseGrouper._aggregate_series_pure_python(self, obj, func)
   1014 for i, group in enumerate(splitter):
-> 1015     res = func(group)
   1016     res = libreduction.extract_result(res)

File ~/anaconda3/envs/ViaEnv/lib/python3.9/site-packages/pandas/core/groupby/generic.py:285, in SeriesGroupBy._python_agg_general.<locals>.<lambda>(x)
    284 func = com.is_builtin_func(func)
--> 285 f = lambda x: func(x, *args, **kwargs)
    287 obj = self._obj_with_exclusions

File ~/anaconda3/envs/ViaEnv/lib/python3.9/site-packages/pyVIA/plotting_via.py:829, in plot_differentiation_flow.<locals>.<lambda>(x)
    827 else: df_mode['celltype'] = via_object.true_label
    828 majority_cluster_population_dict = df_mode.groupby(['cluster'])['celltype'].agg(
--> 829     lambda x: pd.Series.mode(x)[0])  # agg(pd.Series.mode would give all modes) #series
    830 majority_cluster_population_dict = majority_cluster_population_dict.to_dict()

File ~/anaconda3/envs/ViaEnv/lib/python3.9/site-packages/pandas/core/series.py:1007, in Series.__getitem__(self, key)
   1006 elif key_is_scalar:
-> 1007     return self._get_value(key)
   1009 if is_hashable(key):
   1010     # Otherwise index.get_value will raise InvalidIndexError

File ~/anaconda3/envs/ViaEnv/lib/python3.9/site-packages/pandas/core/series.py:1116, in Series._get_value(self, label, takeable)
   1115 # Similar to Index.get_value, but we do not fall back to positional
-> 1116 loc = self.index.get_loc(label)
   1118 if is_integer(loc):

File ~/anaconda3/envs/ViaEnv/lib/python3.9/site-packages/pandas/core/indexes/range.py:347, in RangeIndex.get_loc(self, key)
    346     except ValueError as err:
--> 347         raise KeyError(key) from err
    348 if isinstance(key, Hashable):

KeyError: 0

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
File ~/anaconda3/envs/ViaEnv/lib/python3.9/site-packages/pandas/core/indexes/range.py:345, in RangeIndex.get_loc(self, key)
    344 try:
--> 345     return self._range.index(new_key)
    346 except ValueError as err:

ValueError: 0 is not in range

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
Cell In[120], line 1
----> 1 via.plot_differentiation_flow(via_object=v0, marker_lineages=[9], do_log_flow=True, root_cluster_list=[16])

File ~/anaconda3/envs/ViaEnv/lib/python3.9/site-packages/pyVIA/plotting_via.py:828, in plot_differentiation_flow(via_object, idx, dpi, marker_lineages, label_node, do_log_flow, fontsize, alpha_factor, majority_cluster_population_dict, cmap_sankey, title_str, root_cluster_list)
    826 if len(label_node)>0: df_mode['celltype'] = label_node# v0.true_label
    827 else: df_mode['celltype'] = via_object.true_label
--> 828 majority_cluster_population_dict = df_mode.groupby(['cluster'])['celltype'].agg(
    829     lambda x: pd.Series.mode(x)[0])  # agg(pd.Series.mode would give all modes) #series
    830 majority_cluster_population_dict = majority_cluster_population_dict.to_dict()
    831 print(f'{datetime.now()}\tEnd dictionary modes')

File ~/anaconda3/envs/ViaEnv/lib/python3.9/site-packages/pandas/core/groupby/generic.py:274, in SeriesGroupBy.aggregate(self, func, engine, engine_kwargs, *args, **kwargs)
    269     return self._python_agg_general(func, *args, **kwargs)
    270 except KeyError:
    271     # KeyError raised in test_groupby.test_basic is bc the func does
    272     #  a dictionary lookup on group.name, but group name is not
    273     #  pinned in _python_agg_general, only in _aggregate_named
--> 274     result = self._aggregate_named(func, *args, **kwargs)
    276     # result is a dict whose keys are the elements of result_index
    277     result = Series(result, index=self.grouper.result_index)

File ~/anaconda3/envs/ViaEnv/lib/python3.9/site-packages/pandas/core/groupby/generic.py:412, in SeriesGroupBy._aggregate_named(self, func, *args, **kwargs)
    409 for name, group in self:
    410     object.__setattr__(group, "name", name)
--> 412     output = func(group, *args, **kwargs)
    413     output = libreduction.extract_result(output)
    414     if not initialized:
    415         # We only do this validation on the first iteration

File ~/anaconda3/envs/ViaEnv/lib/python3.9/site-packages/pyVIA/plotting_via.py:829, in plot_differentiation_flow.<locals>.<lambda>(x)
    826 if len(label_node)>0: df_mode['celltype'] = label_node# v0.true_label
    827 else: df_mode['celltype'] = via_object.true_label
    828 majority_cluster_population_dict = df_mode.groupby(['cluster'])['celltype'].agg(
--> 829     lambda x: pd.Series.mode(x)[0])  # agg(pd.Series.mode would give all modes) #series
    830 majority_cluster_population_dict = majority_cluster_population_dict.to_dict()
    831 print(f'{datetime.now()}\tEnd dictionary modes')

File ~/anaconda3/envs/ViaEnv/lib/python3.9/site-packages/pandas/core/series.py:1007, in Series.__getitem__(self, key)
   1004     return self._values[key]
   1006 elif key_is_scalar:
-> 1007     return self._get_value(key)
   1009 if is_hashable(key):
   1010     # Otherwise index.get_value will raise InvalidIndexError
   1011     try:
   1012         # For labels that don't resolve as scalars like tuples and frozensets

File ~/anaconda3/envs/ViaEnv/lib/python3.9/site-packages/pandas/core/series.py:1116, in Series._get_value(self, label, takeable)
   1113     return self._values[label]
   1115 # Similar to Index.get_value, but we do not fall back to positional
-> 1116 loc = self.index.get_loc(label)
   1118 if is_integer(loc):
   1119     return self._values[loc]

File ~/anaconda3/envs/ViaEnv/lib/python3.9/site-packages/pandas/core/indexes/range.py:347, in RangeIndex.get_loc(self, key)
    345         return self._range.index(new_key)
    346     except ValueError as err:
--> 347         raise KeyError(key) from err
    348 if isinstance(key, Hashable):
    349     raise KeyError(key)

KeyError: 0

Any idea what is causing this?

MinatoKobashi commented 9 months ago

I have tried to replicate the error and it seems to be related to the true_label parameter.

When initialising the via object, I have set the true_label using adata.obs["label"] which will create a series object which causes the consequent problem with draw_differentiation_flow(). You can try to fix the problem by converting the series to a list object using true_label=list(adata.obs["label"]).