Raise a GMTInvalidInput exception for only x,y input to blockmedian and blockmean

GenericMappingTools / pygmt

A Python interface for the Generic Mapping Tools.

https://www.pygmt.org

BSD 3-Clause "New" or "Revised" License

759 stars 220 forks source link

Raise a GMTInvalidInput exception for only x,y input to blockmedian and blockmean #1447

Closed maxrjones closed 3 years ago

maxrjones commented 3 years ago

Description of the desired feature

Currently, no exception is raised in the blockmedian and blockmean functions if the x and y parameters are used but not the z parameter. This should be disallowed because these functions require three columns. For example, GMT will raise an error if a textfile with only two columns is used.

The simplest method to fix this problem would be to add the equivalent of these lines of code (using table rather than data) to the start of the _blockm function: https://github.com/GenericMappingTools/pygmt/blob/cd822ca2c01f98a1496c63cc0bff8cc6661cb3bb/pygmt/src/surface.py#L81-L83

Another option could be to extend the virtualfile_from_data function to (optionally) check the number of columns. This would limit redundancy relative to using the same code block for all functions operating on x,y,z data (e.g., blockm*, surface, nearneighbor). If the test were in virtualfile_from_data, we could also reduce the tests required because the exception would not need to be tested in each module that requires three columns.

Are you willing to help implement and maintain this feature? Glad to help someone else with this issue

seisman commented 3 years ago

Another option could be to extend the virtualfile_from_data function to (optionally) check the number of columns.

Sounds a better option to me.

yohaimagen commented 3 years ago

I think I can try to do that, I can try to extend virtualfile_from_data. but I didn't understand how would I know if virtualfile_from_data is called from a function that required z values(e.g blockmean) or called from a function that required only x, y data like plot?

maxrjones commented 3 years ago

I think I can try to do that, I can try to extend virtualfile_from_data. but I didn't understand how would I know if virtualfile_from_data is called from a function that required z values(e.g blockmean) or called from a function that required only x, y data like plot?

One option would be to add an optional keyword parameter z_required to virtualfile_from_data so that this can be specified for each function. This could be passed to data_kind so that the checking is done in the same block as the checking of x and y.

But, I am actually not sure if making this code specific to x, y and z is the best way to go or if we should instead have a more general function that can be used to check that if any one of a set of parameters is not None, all are not None. This way, the function could be used for other GMT parameters that need to be paired (relates to https://github.com/GenericMappingTools/pygmt/issues/256). While I think this would likely be more useful, I have not looked into the implementation.

yohaimagen commented 3 years ago

I think those are separate issues. to raise a descriptive error message we will still need to do something like the following in data_kind

if z_required and parameter_set_not_none(x, y, z):
    raise GMTInvalidInput("Must provide x, y and z.")

parameter_set_not_none can be implemented with something like

def parameter_set_not_none(*args):
    return any(par is None for par in args)

maxrjones commented 3 years ago

I think those are separate issues. to raise a descriptive error message we will still need to do something like the following in data_kind
if z_required and parameter_set_not_none(x, y, z):
    raise GMTInvalidInput("Must provide x, y and z.") 
parameter_set_not_none can be implemented with something like
def parameter_set_not_none(*args):
    return any(par is None for par in args)

OK, do you want to address the first of these two separate issues using z_required? If so, I can assign the issue to you.

yohaimagen commented 3 years ago

can do both if you think both of them are necessary, let's start with the first one I'll open a PR.

seisman commented 3 years ago

I'm wondering if we should use a minimum_cols parameter, rather than a required_z parameter. Note that sometimes functions may need more than 4 columns. For example, the blockmean modules can accept another column for weighting (not implemented in PyGMT yet).

yohaimagen commented 3 years ago

I think minimum_cols is not practical for two reasons.

As I mentioned we want to raise a descriptive error message rather than a vague(e.g Must provided both x, y, and z. rather than let's say Not all necessary data were provided.
Let's say some function needs both x, y, z, and some other column a to know that it needs 4 columns are not enough to check that they are all provided (e.g to check a is not None) I need to know to check specifically a.

I think the z column is widely used in many functions so it is sensible to have a general propose code that checks for that, for other columns that are more function-specific we should check in the function itself(and raise a specific descriptive error). We could implement some helper function as @meghanrjones suggested

But, I am actually not sure if making this code specific to x, y and z is the best way to go or if we should instead have a more general function that can be used to check that if any one of a set of parameters is not None, all are not None. This way, the function could be used for other GMT parameters that need to be paired (relates to #256). While I think this would likely be more useful, I have not looked into the implementation.