Open fedesuad opened 11 months ago
👋 Thanks for opening your first issue here! Please make sure you filled out the template with as much detail as possible. You might also want to take a look at our contributing guidelines and code of conduct.
@fedesuad Thanks for your detailed report. I can reproduce the issue with your example script.
The issue is because, currently PyGMT writes the grdtrack output into a CSV file, and then calls pd.read_csv
to read the output into a pd.DataFrame object.
For the returned pd.DataFrame object, the index defaults to 0-len(df), but you're expecting it to have the same index as the input pd.DataFame object (the points
parameter).
We have many options for the index of the returned pd.DataFrame object
points
parameter if points
is pd.DataFrame, but use 0-based index for other input types (e.g., a file or a 2D array)newcolname
parameter)@fedesuad @GenericMappingTools/pygmt-maintainers What do you think?
Thanks for the response @seisman ! Option 2 seems the most intuitive to me, at least for my applications.
I am sorry for being late here!
For the points
parameter, I feel it's fair to request that it is possible to account for the index of the single records (rows) in case of a pd.DataFrame: For a pd.DataFrame, the index is not only for counting, but it represents a record uniquely and remains unchanged when building a subset. This is different from an array, e.g., row 4 will become row 3 when excluding or removing row 3. Thus, options 2 and 3 appear good to me.
Hm. Currently, I have some preference for option 3. The new parameter would probably be optional, or? So, by default, we could keep the zero-based indexing. This would not affect existing code. And maybe it would be easier for users with less experience with pandas and pd.DataFrames? If users want to use a specific index, they can pass it to the new parameter.
If I understood it correctly, for option 3, the code would change to something like this (I just used new_index
as name for the new parameter):
over_1500=df[df['height']>1500].copy()
over_1500[['lon', 'lat', 'meanheight']] = pygmt.grdtrack(
grid=grd,
points=over_1500[['lon', 'lat']],
newcolname="meanheight",
# New parameter to pass index which should be used for the output dataframe,
# instead of the default zero-based indexing
new_index=over_1500.index,
)
Hm. Currently, I have some preference for option 3. The new parameter would probably be optional, or? So, by default, we could keep the zero-based indexing. This would not affect existing code. And maybe it would be easier for users with less experience with pandas and pd.DataFrames? If users want to use a specific index, they can pass it to the new parameter.
I also prefer to option 3. Option 2 is more intuitive for grdtrack
but may make no sense for other modules. I think we should keep the behavior consistent in all functions that output a table.
Better to address this feature request and other grdtrack issues after PR https://github.com/GenericMappingTools/pygmt/pull/2729.
Description of the problem
When I try to use pygmt.grdtrack on a dataframe that doesn't have default indexing (0 through len(df)), like a slice from an original dataframe, it doesn't work properly. For what I can gather, it works up until the index that coincides with the length of the dataframe, so if the index goes higher than that it starts giving out NaN values.
Minimal Complete Verifiable Example
Full error message
System information