GenericMappingTools / pygmt

A Python interface for the Generic Mapping Tools.
https://www.pygmt.org
BSD 3-Clause "New" or "Revised" License
758 stars 220 forks source link

Handling data processing functions that output to a grid or table #1536

Open weiji14 opened 3 years ago

weiji14 commented 3 years ago

Description of the issue

In the GMT command-line world, there are some data processing functions that can output to either a NetCDF grid or ASCII table. Translating to Python/PyGMT, do we want to 1) have a single function that can output to both (depending on some flag), or 2) have two functions/methods, one which outputs to a grid, and one which outputs to a table.

This is a list of functions that need to be handled:

Originally posted by @weiji14 in https://github.com/GenericMappingTools/pygmt/issues/1433#issuecomment-923441121

I changed the implementation a bit relative to #731 to support ASCII or pandas.DataFrame output for writing out the equalized histogram.

Still, the code is a bit clunky in order to support four different output types (pandas.DataFrame, xarray.DataArray, netCDF, or ASCII). What would you think about having two PyGMT functions for GMT's grdhisteq module rather than just one? One function could write out the data ranges of histogram equalization to a pd.DataFrame or ASCII table and the other could write out the cumulative distribution statistics to a netCDF file or xarray.DataArray. I guess coming up with the names for these would be harder than the current implementation, but I think it would be more user friendly long-term.

Yeah I've debated a bit on whether to have 2 functions too, something like a pygmt.grdhisteq.to_table() and pygmt.grdhisteq.to_grid() (implemented using Python classmethods), or maybe with an underscore like pygmt.grdhisteq_to_table() and pygmt.grdhisteq_to_grid() (implemented purely using Python functions). Tying this to https://github.com/GenericMappingTools/pygmt/issues/1318#issuecomment-855317785, I think the split into 2 may have to happen eventually, especially if we want to support more table-like outputs (ascii/numpy/pandas/geopandas/etc) like what Will is doing at grd2xyz #1284.

Possible implementation styles

These are how the implementation would look like, using triangulate as an example.

Single function

def triangulate(data, outgrid=None, outfile=None):
    pass

Two Python functions

Have a common _triangulate function that handles grid or table outputs, some similarities to the _blockm.

def _triangulate(data, outgrid=None, outfile=None):
    pass

def triangulate_to_grid(data, outgrid=None):
    pass

def triangulate_to_table(data, outfile=None):
    pass

Two methods in a single Python class :heavy_check_mark:

class triangulate:
    def _triangulate():
        pass

    @staticmethod
    def to_grid(data, outgrid=None):
        pass

    @staticmethod
    def to_table(data, outfile=None):
        pass

Are you willing to help implement and maintain this feature? Vote for which API style you prefer!

P.S. Also xref #896 where there is a similar API design discussion on wrapping GMT functions that do either plotting or data processing.

maxrjones commented 3 years ago

I like the syntax of the class method style, but dislike using classes in a functional programming style with the staticmethod decorator. I would also prefer for the function/method names to be more descriptive regarding the output than 'to_table' or 'to_grid'. For example, triangulate.find_voronoi_edges or triangulate.find_delauney_edges or triangulate.grid_data.

In https://github.com/GenericMappingTools/pygmt/tree/grdhisteq-functions, I tried to implement a syntax similar to the class based option and the pygmt.datasets.load_* functions while still keeping the design functional. The functions work and I think the syntax is actually quite user-friendly, however, I could not get the import statements working for autodoc. Any advice here would be appreciated. I mixed up merge commits and needed to close #1571 due to divergence with the grdhisteq branch. I could either discard the class-based design, discard the functional design, or open a different PR from grdhisteq-functions with main as the target branch to compare the two options.

maxrjones commented 2 years ago

The implementation of grdhisteq in https://github.com/GenericMappingTools/pygmt/pull/1433 uses the "Two methods in a single Python class" style and is currently on a final review call. If that PR gets merged, I think we should stick with that style for the other functions that output to a grid or table for consistency. So, please comment either here or in that PR if anyone does not like that design choice.

weiji14 commented 2 years ago

Thanks Meghan for getting the grdhisteq function done. I'll refactor the triangulate implementation in #731 to use a similar "Two methods in a single Python class" style to be consistent.

maxrjones commented 2 years ago

Just a note that this issue can be closed after the recommended structure (as used in grdhisteq and triangulate) is added to the contributing guide. The guidance could be added as a follow-up to https://github.com/GenericMappingTools/pygmt/pull/1687.

seisman commented 2 years ago

Just a note that this issue can be closed after the recommended structure (as used in grdhisteq and triangulate) is added to the contributing guide.

The contributing guide is already too long, do you think we should add a code style guide instead? Here is an example from ObsPy https://docs.obspy.org/coding_style.html.

maxrjones commented 2 years ago

Just a note that this issue can be closed after the recommended structure (as used in grdhisteq and triangulate) is added to the contributing guide.

The contributing guide is already too long, do you think we should add a code style guide instead? Here is an example from ObsPy https://docs.obspy.org/coding_style.html.

Yes, a code style guide would be a good alternative. We could also move a bunch of the other information into a docs style guide if it's important to have the base contributing guide shorter.