The-Academic-Observatory / observatory-reports

Apache License 2.0
2 stars 0 forks source link

Conversion of current Seaborn and Matplotlib Graphs to Plotly #2

Open cameronneylon opened 3 years ago

cameronneylon commented 3 years ago

Currently the graphs generated for inclusion in reports are generated by Seaborn and Matplotlib, sometimes through built in functionality in pandas. This can generate flexible static images but has a range of limitations for more dynamic graphs and does not easily translate to producing online and web content. The plan is to move across to using plotly as the basis for graphing.

Plotly provides a native javascript rendering for dynamic charts for the web, as well as fully functional HTML pages and static images for documents. The goal is to enable a wide range of use cases that might include:

To achieve this we need clean separation of the reports from the graph/layout generation code. This is readily achieved through the charts submodule in obversatory.reports, although in the future we may want to move charts to its own submodule level. Improving on the design of the current charts module while implementing plotly (and retaining backwards compatibility for as long as feasible) the following design is proposed.

  1. To existing charts add a plotly() function which a) calls the data processing step if it has not already been called (if necessary implementing a new one for plotly) and b) generates the figure
  2. The top level plotly function in a chain always returns a plotly Figure object. If a chart is made up of other charts the subsidiary chart plotly functions will return traces or more explicitly a data object for incorporation into the top level chart (this is functionally similar to the way the current library checks whether it has been passed an axis, if so it plots to the axis, if not it creates an appropriate figure - the different here is that the disposition of the traces onto any subplots needs to be handled at the top level, either in the calling function, or by passing the relevant subplots down, because the abstraction of data and layout is slightly different in plotly)
  3. The decision on what format(s) to output the final graph is handled at the top level (in the first instance in the report analytics function) and that functionality is provided by the plotly Figure that is returned. The analytics function would ideally not need to be aware that it is not directly calling a plotly Figure generating function (we can consolidate the call signatures later)

The question of how best to implement shared formatting across graphs is a TODO that I'm not entirely sure how to handle. All of those details are held in the layout object, so simply inheriting from a Base Class might be the simplest, but for future flexibility using the theme functionality probably makes sense, and build the default theme loading into the base class.

cameronneylon commented 3 years ago

As a follow on to this pragmatically there isn't the headspace to do this full at the moment. Chart by chart provision of plotly plots is working ok, but there are challenges with Subplots.

Overall it is easiest to handle that at the individual chart code level and manage those cases where multiple plots are required explicitly. A better approach would be what is envisaged above but that will require a substantial rewrite and will break backwards compatibility that we currently have in place, so best left for the moment IMO.