barak1412 / polars_ml

MIT License
5 stars 2 forks source link

Feature request: Reachability queries #1

Open elyase opened 4 months ago

elyase commented 4 months ago

Reachability queries determine whether there is a path between two nodes in a graph, regardless of the path's optimality or length. This simpler problem allows for more optimization / indexing structures.

It is similar to connected components that involve grouping all nodes into subsets where each node is reachable from any other node within the same subset

References: https://arxiv.org/pdf/2311.03542

barak1412 commented 4 months ago

@elyase Thank you for your proposal. Tell me whether the following design is practical for you:

import polars as pl
import polars_ml as pml

graph_df = pl.Dataframe({
   'src': ['v1', 'v2', 'v3', 'v4'],
   'neighbors': [['v2'], ['v1'], ['v4'],  ['v3']]
})

reachable_df = graph_df.select(
   pml.graph.has_path(pl.col('src'), pl.col('neighbors'),
      starting_nodes=['v1', 'v1'],
      target_nodes=['v2', 'v3']
   ).alias('reachable')
)

Which should yield:

shape: (2, 1)
┌─────────┐
│ reachable│
│ ---------│
│ bool     │
╞═════════╡
│ true     │
│ false    │
└─────────┘