Open mgdenno opened 2 weeks ago
@samlamont what you say about this idea?
@mgdenno Looks good to me. But it makes me want to add ev.timeseries.plot()
:) (Maybe we could have plot methods on the timeseries and location table classes and also use the accessor on the metrics results?)
I wonder if we should also do ev.joined_timeseries.create()
or .load()
? We could also have a .describe()
method on each table to summarize it's contents (like get_field_statistics()
)
So we would have ev.fetch
and ev.metrics
, and the rest would be driven though the table classes (?)
I like the ev.joined_timeseries.create()
and ev.joined_timeseries.describe()
I like ev.timeseries.plot()
too but I feel like we would need to solve the re-collecting issue to really do it. Although I like the clean syntax around it, I'm afraid the user experience would not be good. I wonder if there could be a cache/clear mechanism that would prevent re-collecting results from PySpark? It would have to clear everytime filter()
, order()
or query()
was passed or chained.
As far as having different methods on the table vs. the metrics, maybe as long as they were similar enough it wouldn't be too confusing. In this case for a timeseries query it would re-query each time a plot was created? ON small datasets this would be fine, on larger ones, I am less sure how performance would be.
Do you envision there be one plot method that takes arguments to specify what to plot and how? Or multiple plot methods?
On the plotting, I was initially thinking of one method that takes arguments, but maybe it could be similar to the pattern pandas where you can do both df.plot(type="scatter", ...)
and df.plot.scatter(...)
. It looks like PySpark has a persist method which could potentially be used to cache the data. Would take some experimentation.
Maybe it's a moot point however because I found that the accessor does actually have access to the dataframe and it's contents, and the attrs
property seems to allow us to pass in info along with it, so we could include the list of metric pydantic models, for example, which could contain plotting info, along with other metadata, which could then be read by the plotting methods in the accessor class. Maybe that's the way to go. Sorry to derail the original point of this issue!
@mgdenno I can work on this issue unless you prefer to?
Perhaps, for example we should have: