Misleading description for the spacing parameter of pygmt.info

core-man commented 3 years ago

I find the description for spacing parameter of pygmt.info may be a little misleading for new GMT users.

It seems that only when the per_column parameter is combined with spacing, then the numpy.ndarray output will be rounded up/down for as many columns as there are increments provided in spacing. Without per_column, spacing only output the min/max values for the first two columns no matter how many spacings we input. So, we should always set per_column to be True when we use spacing unless only the first two columns are needed.

Test:

I use the data in 3D Scatter plots to test spacing and per_column.

import pandas as pd
import pygmt

# Load sample iris data, and convert 'species' column to categorical dtype
df = pd.read_csv("https://github.com/mwaskom/seaborn-data/raw/master/iris.csv")
df["species"] = df.species.astype(dtype="category")

# Use pygmt.info to get region bounds (xmin, xmax, ymin, ymax, zmin, zmax)
region = pygmt.info(
    table=df[["petal_width", "sepal_length", "petal_length"]],  # x, y, z columns
    per_column=True,  # report output as a numpy array
    spacing=(1, 0.2, 0.3),  # rounds x, y and z intervals by 1,0.2 and 0.3 respectively
)
print(region)

Output with spacing (-I) and per_column (-C):

[0.  3.  4.2 8.  0.9 7.2]

Output with only spacing (-I):

[0.  3.  4.2 8. ]

Output with only per_column (-C):

[0.1 2.5 4.3 7.9 1.  6.9]

Output with neither of per_column (-I) nor spacing (-C):

<vector memory>: N = 150 <0.1/2.5> <4.3/7.9> <1/6.9>

See the documentation of GMT and PyGMT:

The description of gmtinfo at GMT DOC :

As another option, info can find the extent of data in the first two columns rounded up and down to the nearest multiple of the supplied increments given by -I.

If -C is combined with -I then the output will be in column form and rounded up/down for as many columns as there are increments provided in -I.

The corresponding description at PyGMT DOC:

As an option, it will find the extent of the first two columns rounded up and down to the nearest multiple of the supplied increments given by spacing.

This is right. When we only use spacing without per_column, only the rounded min/max values of the first columns will be output.

If the per_column parameter is combined with spacing, then the numpy.ndarray output will be rounded up/down for as many columns as there are increments provided in spacing.

This is exactly right. When we use both per_column and spacing, we can output the rounded min/max values for each column provided in spacing.

The detailed description of -I at GMT DOC:

Report the min/max of the first n columns to the nearest multiple of the provided increments (separate the n increments by slashes), and output results in the form -Rw/e/s/n (unless -C is set). If only one increment is given we also use it for the second column (for backwards compatibility).
The corresponding description of spacing at PyGMT DOC:

spacing (str) – [b|p|f|s]dx[/dy[/dz…]]. Report the min/max of the first n columns to the nearest multiple of the provided increments and output results in the form [w, e, s, n].

The first part of the description assumes per_column is set to be True, while the second part assumes per_column is not used.

weiji14 commented 3 years ago

It seems that only when the per_column parameter is combined with spacing, then the numpy.ndarray output will be rounded up/down for as many columns as there are increments provided in spacing. Without per_column, spacing only output the min/max values for the first two columns no matter how many spacings we input.

Yes you are correct here. Granted, we pretty much just copied the documentation from GMT, so it might be considered an 'upstream' issue.

So, we should always set per_column to be True when we use spacing unless only the first two columns are needed.

I would advise against automatically setting per_column=True. There may be use cases where people don't want that to happen, someone's bug is another person's feature.

spacing (str) – [b|p|f|s]dx[/dy[/dz…]]. Report the min/max of the first n columns to the nearest multiple of the provided increments and output results in the form [w, e, s, n].

The first part of the description assumes per_column is set to be True, while the second part assumes per_column is not used.

As mentioned above, this was copied directly from https://github.com/GenericMappingTools/gmt/blame/6.1/doc/rst/source/gmtinfo.rst#L114-L116. We can change the wording here in PyGMT, but it might be worth submitting a Pull Request upstream as well so that things are consistent across projects.

P.S. I edited your original post slightly, was hard to think in double negatives.

core-man commented 3 years ago

As mentioned above, this was copied directly from https://github.com/GenericMappingTools/gmt/blame/6.1/doc/rst/source/gmtinfo.rst#L114-L116. We can change the wording here in PyGMT, but it might be worth submitting a Pull Request upstream as well so that things are consistent across projects.

Submitted to GMT (https://github.com/GenericMappingTools/gmt/issues/4938).

seisman commented 3 years ago

The upstream documentation issue has been fixed in https://github.com/GenericMappingTools/gmt/pull/5014.

GenericMappingTools / pygmt

Misleading description for the spacing parameter of pygmt.info #1035