Open elyase opened 4 months ago
@elyase Thank you for your proposal. Tell me whether the following design is practical for you:
import polars as pl
import polars_ml as pml
graph_df = pl.Dataframe({
'src': ['v1', 'v2', 'v3', 'v4'],
'neighbors': [['v2'], ['v1'], ['v4'], ['v3']]
})
reachable_df = graph_df.select(
pml.graph.has_path(pl.col('src'), pl.col('neighbors'),
starting_nodes=['v1', 'v1'],
target_nodes=['v2', 'v3']
).alias('reachable')
)
Which should yield:
shape: (2, 1)
┌─────────┐
│ reachable│
│ ---------│
│ bool │
╞═════════╡
│ true │
│ false │
└─────────┘
Reachability queries determine whether there is a path between two nodes in a graph, regardless of the path's optimality or length. This simpler problem allows for more optimization / indexing structures.
It is similar to connected components that involve grouping all nodes into subsets where each node is reachable from any other node within the same subset
References: https://arxiv.org/pdf/2311.03542