GenericMappingTools / pygmt

A Python interface for the Generic Mapping Tools.
https://www.pygmt.org
BSD 3-Clause "New" or "Revised" License
747 stars 216 forks source link

pygmt.select: Wrong column names of pandas.DataFrame when incols parameter is used #2463

Open yvonnefroehlich opened 1 year ago

yvonnefroehlich commented 1 year ago

Description of the problem

The column names of the pandas.Dataframe created by pygmt.select are wrong, when the incols parameter is used. The order of the columns itself is correctly changed, but the order of the column names remains as is.

Minimal Complete Verifiable Example

import pygmt

ship_data = pygmt.datasets.load_sample_data(name="bathymetry")
ship_data.head(10)
"""
   longitude  latitude  bathymetry
0  245.00891  27.49555      -636.0
1  245.01201  27.49286      -655.0
2  245.01512  27.49016      -710.0
3  245.01822  27.48746      -695.0
4  245.02443  27.48206      -747.0
5  245.03374  27.47397      -747.0
6  245.03684  27.47127      -702.0
7  245.03994  27.46857      -700.0
8  245.04305  27.46587      -648.0
9  245.04615  27.46317      -647.0
"""

ship_data_resort = ship_data[["bathymetry", "latitude", "longitude"]]
ship_data_resort.head(10)
"""
   bathymetry  latitude  longitude
0      -636.0  27.49555  245.00891
1      -655.0  27.49286  245.01201
2      -710.0  27.49016  245.01512
3      -695.0  27.48746  245.01822
4      -747.0  27.48206  245.02443
5      -747.0  27.47397  245.03374
6      -702.0  27.47127  245.03684
7      -700.0  27.46857  245.03994
8      -648.0  27.46587  245.04305
9      -647.0  27.46317  245.04615
"""

ship_data_filted = pygmt.select(
    data=ship_data_resort,  # bathy, lat, lon
    region=[246, 247, 20, 21],  # filter based on region
    incols=[2, 1, 0],  # lon, lat, bathy
)
ship_data_filted.head(10)
"""
   bathymetry  latitude  longitude
0   246.75398  20.99297      -3675
1   246.75707  20.98365      -3668
2   246.76016  20.97434      -3683
3   246.76326  20.96503      -3707
4   246.76635  20.95572      -3707
5   246.76944  20.94641      -3679
6   246.77254  20.93710      -3677
7   246.77563  20.92779      -3653
8   246.77872  20.91848      -3653
9   246.78182  20.90916      -3629
"""
# -> column names of DataFrame are not resorted

Full error message

No error or warning message occurs.

Wrong column names of the `pandas.DataFrame` created by `pygmt.select` (in combination with the `incols` parameter).

System information

PyGMT information:
  version: v0.8.1.dev72
System information:
  python: 3.11.0 | packaged by conda-forge | (main, Oct 25 2022, 06:12:32) [MSC v.1929 64 bit (AMD64)]
  executable: C:\ProgramData\Anaconda3\envs\pygmt_env_dev\python.exe
  machine: Windows-10-10.0.19045-SP0
Dependency information:
  numpy: 1.24.1
  pandas: 1.5.3
  xarray: 2023.1.1.dev17
  netCDF4: 1.6.2
  packaging: 22.0
  contextily: 1.3.0
  geopandas: 0.12.2
  ghostscript: 9.54.0
GMT library information:
  binary version: 6.4.0
  cores: 4
  grid layout: rows
  library path: C:/ProgramData/Anaconda3/envs/pygmt_env_dev/Library/bin/gmt.dll
  padding: 2
  plugin dir: C:/ProgramData/Anaconda3/envs/pygmt_env_dev/Library/bin/gmt_plugins
  share dir: C:/Program Files (x86)/gmt6/share
  version: 6.4.0
weiji14 commented 1 year ago

Yes, the issue is that the column names are somewhat hardcoded here:

https://github.com/GenericMappingTools/pygmt/blob/8eac8d01b35690a5d8e38472a15abad284b3f437/pygmt/src/select.py#L176-L177

Note though, that this affects a few other PyGMT functions that use the same logic:

https://github.com/GenericMappingTools/pygmt/blob/8eac8d01b35690a5d8e38472a15abad284b3f437/pygmt/src/grdtrack.py#L327-L328

https://github.com/GenericMappingTools/pygmt/blob/8eac8d01b35690a5d8e38472a15abad284b3f437/pygmt/src/blockm.py#L62-L63

Maybe some others too? There's probably an easy way to use some Python indexing method to change the column order when incols is used? But I also feel that it's better for the user to reorder the columns using pandas after the output table has been produced :slightly_smiling_face: