grdtrack raises an input error when points are not a DataFrame and no column name is specified

willschlitzer commented 3 years ago

The documentation for grdtrack states that an argument for newcolname is required if the argument for points is a DataFrame. However, a GMTInvalidInput error requiring an argument for newcolname is raised if points is a numpy array or a list as well.

Full code that generated the error Import statement

import pygmt
import numpy as np
grid = pygmt.datasets.load_earth_relief(region=[-5, 5, -5, 5])

Points as a numpy array

points_matrix = np.array([[1,1], [1,2], [2,2]])
assert type(points_matrix) == np.ndarray
grdtrack_output = pygmt.grdtrack(points=points_matrix, grid=grid)

Points as a list

points_list = [[1,1], [1,2], [2,2]]
assert type(points_list) == list
grdtrack_output = pygmt.grdtrack(points=points_list, grid=grid)

Full error message

GMTInvalidInput: Please pass in a str to 'newcolname'

The error is raised in lines 251-252 in grdtrack.py

if data_kind(points) == "matrix" and newcolname is None:
    raise GMTInvalidInput("Please pass in a str to 'newcolname'")

It looks like data_kind() won't distinguish between a DataFrame and a numpy array and returns "matrix" in both instances. Should newcolname be a required parameter for all matrix-type data, since the function still returns a DataFrame if no outfile is specified, or should the error handling check to make sure that points is a DataFrame?

maxrjones commented 3 years ago

I think the motivation for having newcolname as required for the dataframe input and output case case was probably because two columns cannot have the same name, so it's not simple to use a default column name for the output. Since this isn't a problem when creating a dataframe from numpy inputs, I do not think it should be required for that case.

maxrjones commented 3 years ago

Another note regarding newcolname (originally posted in https://github.com/GenericMappingTools/pygmt/pull/1345#issuecomment-870091881) - it should also accept a list with multiple column names for instances in which multiple grids are sampled.

willschlitzer commented 3 years ago

@meghanrjones So are you thinking there should be a check to see if the type is not a DataFrame and then it will substitute in a default column name is it check passes?

maxrjones commented 3 years ago

@meghanrjones So are you thinking there should be a check to see if the type is not a DataFrame and then it will substitute in a default column name is it check passes?

I imagine that the output DataFrame column names for non-DataFrame input when newcolname is not given could be the column index, similar to the default pandas behavior for pandas.read_csv(file,header=None).

willschlitzer commented 3 years ago

@meghanrjones So are you thinking there should be a check to see if the type is not a DataFrame and then it will substitute in a default column name is it check passes?

I imagine that the output DataFrame column names for non-DataFrame input when newcolname is not given could be the column index, similar to the default pandas behavior for pandas.read_csv(file,header=None).

Since the default argument for newcolname is None, I think the easiest solution is to change the if statement to raise GMTInvalidInput if points are a DataFrame.

GenericMappingTools / pygmt

grdtrack raises an input error when points are not a DataFrame and no column name is specified #1344