Open narnolddd opened 1 year ago
On this point, with #884 we added a function time_index()
to window sets returned by g.rolling()
and similar functions that gives you a python iterable with this kind of scenario in mind. Basically you can do things like:
windows= g.rolling('1 day')
df = pd.DataFrame()
df['time'] = window.time_index()
df['number_of_vertices'] = [w.num_vertices() for w in windows]
df
which should output: | time | number_of_vertices |
---|---|---|
2020-06-21 12:34:65 | 86400 | |
... | ... |
I think we could go further by implementing more vectorized functions on top of window sets. For instance, a num_vertices
function that return the number of vertices per window, so you can simply do:
df['number_of_vertices'] = windows.num_vertices()
And as an addition, we could have a function to_pandas
available for the iterables that returns a pandas Series. That would allow us to integrate the time index on the same Series index. That way we could do things like:
windows= g.rolling('1 day')
df = pd.DataFrame()
df['number_of_vertices'] = windows.num_vertices().to_pandas()
df['number_of_edges'] = windows.num_edges().to_pandas()
df
And the output would be:
index | number_of_vertices | number_of_edges |
---|---|---|
2020-06-21 12:34:65 | 86400 | 340770 |
... | ... | ... |
without needing to explicitly set the index. Another name for to_pandas()
might be with_index()
, but maybe is better conveying the fact that we are moving to pandas world after calling this function.
Hijacking this ticket a little bit as it would be great if we could create a view_builder class which allows:
Would be really nice to have a workflow for how to get stats from multiple different algos and window sizes into a pandas dataframe (kind of similar to the to_df on the old Raphtory for global state algorithms). I don't know if this best exists within the core library or as a notebook example. Would be nice to have something like:
working with lists as numpy arrays is a bit painful for doing processing on