JuDFTteam / masci-tools

Post-processing toolkit for electronic structure calculations
https://masci-tools.readthedocs.io
MIT License
17 stars 10 forks source link

Implement unified way of checking dimensions, exporting and unifying data for plots #26

Open janssenhenning opened 3 years ago

janssenhenning commented 3 years ago

The current way of providing data for plots in the plot_methods is through arrays or lists. The dimension checking is quite fragile and also varies from method to method (both on the develop and plot_methods_refactor branch)

We should have:

janssenhenning commented 3 years ago

Actually after working with bokeh plots a bit more I would be in favor to implement similar behaviour for matplotlib plots. So you can define a DataFrame (e.g with pandas) and give the keys you want to plot. With pandas this would be already possible to plot by calling the plotting methods directly on the dataframe but I think just giving the data and then indexing the right keys would be enough, since I think the interface is slightly different, which might be confusing

Of course we could construct a dataframe if it is not given and support all kinds of ways of giving the data in this way

This would probably also massively simplify exporting the plot data to files

Irratzo commented 3 years ago

Hi Henning, I'm not familiar with the masci-tools.vis modules. This is just food for thought. And about thematic overlap of this issue and that issue (integration of the branch studentproject18w into the main code, as much as is sensible) (Disclaimer: I wrote that code.).

The goal for that project/branch was to provide an interactive bandstructure+DOS plotter from fleur HDF5 output files with two user frontends (Tkinter desktop program, a Jupyter dashboard), using the same base code.

The outcome was

  1. a preprocessor interface to transform fleur HDF output into Python classes for different use cases,
  2. a plotting class hierarchy to unify code for a) plotting methods for different tools (matplotlib, bokeh, ...) and b) different use cases (bandstructure, DOS, ...).

(The frontends worked, the jupyter dashboard can still be tried out via the binder badge in the README.)

Now a little more detail how it works.

The preprocessor takes a JSON recipe, e.g. FleurBands, which specifies the datasets to extract from the HDF file, what transformations to apply to each, and the desired output type. The output type specifies functions for postprocessing data manipulation, e.g. for plotting. The reader then reads the datasets from the HDF file, transforms the datasets (dependencies between datasets for transformations are resolved automatically), creates an instance of the specified output type, and adds the transformed datasets as attributes of that instance. The attributes remain h5py datasets (ie, file-storage access), but can be 'moved to memory' optionally (changed into numpy arrays).

The Plotters (plotting classes) derive from an abstract class with an abstract data attribute, of which the preprocessor's output types are subclasses. For example, the AbstractBandPlot's data attribute is of type FleurBandData. The Plotters' actual plotting methods' arguments then do not take data, but only data selection arguments which operate on the underlying data attribute. This addresses at least partially your 'providing data' concern above.

(Side note: branch did not have pandas Dataframes in mind.)

(Side note: the problem with this approach is of course, that it relies on the whole pipeline, ie data comes only from HDF. But I think this can be relaxed.)

(Side note about the hierarchical Plotter classes concept: this can lead to a combinatorial explosion of classes, because you need to define a class for every use case and every plotting library. I don't know with which pythonic Design Pattern this problem could be solved more efficiently.)