Pometry / Raphtory

Scalable graph analytics database powered by a multithreaded, vectorized temporal engine, written in Rust
https://raphtory.com
GNU General Public License v3.0
336 stars 54 forks source link

Graph view builder (Smoother post-processing workflow for windowed data) #910

Open narnolddd opened 1 year ago

narnolddd commented 1 year ago

Would be really nice to have a workflow for how to get stats from multiple different algos and window sizes into a pandas dataframe (kind of similar to the to_df on the old Raphtory for global state algorithms). I don't know if this best exists within the core library or as a notebook example. Would be nice to have something like:

time windowsize number_of_nodes number_of_vertices other_metrics
1 86400 36 24 ...
... ... ... ... ...

working with lists as numpy arrays is a bit painful for doing processing on

ricopinazo commented 1 year ago

On this point, with #884 we added a function time_index() to window sets returned by g.rolling() and similar functions that gives you a python iterable with this kind of scenario in mind. Basically you can do things like:

windows= g.rolling('1 day')

df = pd.DataFrame()
df['time'] = window.time_index()
df['number_of_vertices'] = [w.num_vertices() for w in windows]
df
which should output: time number_of_vertices
2020-06-21 12:34:65 86400
... ...

I think we could go further by implementing more vectorized functions on top of window sets. For instance, a num_vertices function that return the number of vertices per window, so you can simply do:

df['number_of_vertices'] = windows.num_vertices()

And as an addition, we could have a function to_pandas available for the iterables that returns a pandas Series. That would allow us to integrate the time index on the same Series index. That way we could do things like:

windows= g.rolling('1 day')

df = pd.DataFrame()
df['number_of_vertices'] = windows.num_vertices().to_pandas()
df['number_of_edges'] = windows.num_edges().to_pandas()
df

And the output would be:

index number_of_vertices number_of_edges
2020-06-21 12:34:65 86400 340770
... ... ...

without needing to explicitly set the index. Another name for to_pandas() might be with_index(), but maybe is better conveying the fact that we are moving to pandas world after calling this function.

miratepuffin commented 4 weeks ago

Hijacking this ticket a little bit as it would be great if we could create a view_builder class which allows: