GenericMappingTools / pygmt

A Python interface for the Generic Mapping Tools.
https://www.pygmt.org
BSD 3-Clause "New" or "Revised" License
759 stars 220 forks source link

Bug with x2sys_cross module not handling NaN values properly #1368

Closed isLiYang closed 3 years ago

isLiYang commented 3 years ago

Description of the problem When I using the x2sys_cross function, passing a pandas.DataFrame I got no result return. However, passing an ASCII file the the program works fine. I don't know what went wrong, and I tested the code of pull request#591 it worked. I upload the ssh.txt for running the above code. See my testdata repertory. https://github.com/isLiYang/testdata

I hope someone can solve my question.

Thank You.

Full code that generated the error

import numpy as np
import pandas as pd
import pygmt
import os
from tempfile import TemporaryDirectory

# load my data.
data = np.loadtxt('ssh.txt')
df = pd.DataFrame({
    'lon': data[:, 0],
    'lat': data[:, 1],
    'ssh': data[:, 2]
})
df.columns = ['lon', 'lat', 'z']

# load test data from: Pull request #591.
# https://github.com/GenericMappingTools/pygmt/pull/591
dataframe: pd.DataFrame = pygmt.datasets.load_sample_bathymetry()
dataframe.columns = ["x", "y", "z"]  # longitude, latitude, bathymetry

# Test plll resuest #591, it works fine.
os.environ["X2SYS_HOME"] = os.getcwd()

with TemporaryDirectory(prefix="X2SYS", dir=os.environ["X2SYS_HOME"]) as tmpdir:
    tag = os.path.basename(tmpdir)
    pygmt.x2sys_init(tag=tag, fmtfile="xyz", suffix="xyz", force=True)
    output: pd.DataFrame = pygmt.x2sys_cross(tracks=[dataframe], tag=tag, coe="i", verbose="i")

# test my data.
# 1.When passing a ascii file, the calculation is fine and the ascii file and the dataframe object can be output.
os.environ["X2SYS_HOME"] = os.getcwd()
with TemporaryDirectory(prefix="X2SYS", dir=os.environ["X2SYS_HOME"]) as tmpdir:
    tag = os.path.basename(tmpdir)
    # passing file.
    pygmt.x2sys_init(tag=tag, fmtfile='geoz', suffix='txt', force=True, discontinuity='g', gap=['d100'], verbose='i')
    pygmt.x2sys_cross(tracks=['ssh.txt'], outfile='pyxovers1.dat', tag=tag, interpolation='a', verbose='i', region='g', trackvalues=True)
    xovers1 = pygmt.x2sys_cross(tracks=['ssh.txt'], tag=tag, interpolation='a', verbose='i', region='g', trackvalues=True)

# 2.When passing a dataframe, it will not output the result of the calculation or report an error.
os.environ["X2SYS_HOME"] = os.getcwd()
with TemporaryDirectory(prefix="X2SYS", dir=os.environ["X2SYS_HOME"]) as tmpdir:
    tag = os.path.basename(tmpdir)
    # passing file.
    pygmt.x2sys_init(tag=tag, fmtfile='geoz', force=True, discontinuity='g', gap=['d100'], verbose='i')
    pygmt.x2sys_cross(tracks=[df], outfile='pyxovers2.dat', tag=tag, interpolation='a', verbose='i', region='g', trackvalues=True)
    xovers2 = pygmt.x2sys_cross(tracks=[df], tag=tag, interpolation='a', verbose='i', region='g', trackvalues=True)

print('finished...')

Full error message pyxovers2.dat will be created, but no data in there. below are message of

pygmt.x2sys_cross(tracks=[df], outfile='pyxovers2.dat', tag=tag, interpolation='a', verbose='i', region='g', trackvalues=True)

and

    xovers2 = pygmt.x2sys_cross(tracks=[df], tag=tag, interpolation='a', verbose='i', region='g', trackvalues=True)
x2sys_cross [INFORMATION]: Files found: 1
x2sys_cross [INFORMATION]: Checking for duplicates :
x2sys_cross [INFORMATION]: 0 found
x2sys_cross [WARNING]: No time column, use dummy times
x2sys_cross [INFORMATION]: Writing Data Table to file pyxovers2.dat
x2sys_cross [INFORMATION]: Files found: 1
x2sys_cross [INFORMATION]: Checking for duplicates :
x2sys_cross [INFORMATION]: 0 found
x2sys_cross [WARNING]: No time column, use dummy times
x2sys_cross [INFORMATION]: Writing Data Table to file /tmp/pygmt-4qfsz7oy.txt
Traceback (most recent call last):
  File "/mnt/e/HY2A/tmp.py", line 50, in <module>
    xovers2 = pygmt.x2sys_cross(tracks=[df], tag=tag, interpolation='a', verbose='i', region='g', trackvalues=True)
  File "/home/liyang/anaconda3/envs/pygmt/lib/python3.9/site-packages/pygmt/helpers/decorators.py", line 414, in new_module
    return module_func(*args, **kwargs)
  File "/home/liyang/anaconda3/envs/pygmt/lib/python3.9/site-packages/pygmt/helpers/decorators.py", line 557, in new_module
    return module_func(*args, **kwargs)
  File "/home/liyang/anaconda3/envs/pygmt/lib/python3.9/site-packages/pygmt/src/x2sys_cross.py", line 228, in x2sys_cross
    table = pd.read_csv(
  File "/home/liyang/anaconda3/envs/pygmt/lib/python3.9/site-packages/pandas/io/parsers.py", line 610, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/home/liyang/anaconda3/envs/pygmt/lib/python3.9/site-packages/pandas/io/parsers.py", line 462, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/home/liyang/anaconda3/envs/pygmt/lib/python3.9/site-packages/pandas/io/parsers.py", line 819, in __init__
    self._engine = self._make_engine(self.engine)
  File "/home/liyang/anaconda3/envs/pygmt/lib/python3.9/site-packages/pandas/io/parsers.py", line 1050, in _make_engine
    return mapping[engine](self.f, **self.options)  # type: ignore[call-arg]
  File "/home/liyang/anaconda3/envs/pygmt/lib/python3.9/site-packages/pandas/io/parsers.py", line 1873, in __init__
    self._reader = parsers.TextReader(self.handles.handle, **kwds)
  File "pandas/_libs/parsers.pyx", line 521, in pandas._libs.parsers.TextReader.__cinit__
pandas.errors.EmptyDataError: No columns to parse from file

Process finished with exit code 1

System information

PyGMT information:
  version: v0.4.0
System information:
  python: 3.9.5 (default, Jun  4 2021, 12:28:51)  [GCC 7.5.0]
  executable: /home/liyang/anaconda3/envs/pygmt/bin/python
  machine: Linux-4.4.0-19041-Microsoft-x86_64-with-glibc2.31
Dependency information:
  numpy: 1.20.2
  pandas: 1.2.5
  xarray: 0.18.2
  netCDF4: 1.5.7
  packaging: 20.9
  ghostscript: 9.54.0
  gmt: 6.2.0
GMT library information:
  binary dir: /home/liyang/anaconda3/envs/pygmt/bin
  cores: 8
  grid layout: rows
  library path: /home/liyang/anaconda3/envs/pygmt/lib/libgmt.so
  padding: 2
  plugin dir: /home/liyang/anaconda3/envs/pygmt/lib/gmt/plugins
  share dir: /home/liyang/anaconda3/envs/pygmt/share/gmt
  version: 6.2.0
welcome[bot] commented 3 years ago

👋 Thanks for opening your first issue here! Please make sure you filled out the template with as much detail as possible. You might also want to take a look at our contributing guidelines and code of conduct.

weiji14 commented 3 years ago

Hi @isLiYang, thanks for filing this bug report! I can reproduce your error, and have tracked down the problem to this section of the code:

https://github.com/GenericMappingTools/pygmt/blob/d90b3fc889b53633deab6b4ab77612ac7a247c1b/pygmt/src/x2sys_cross.py#L46-L51

The problem is that PyGMT was not handling the NaN values in your dataframe properly. By default, pandas.to_csv writes out blank '' NaN values, instead of the word NaN to file, but GMT requires an explicit NaN value in the file in order to understand it. The solution is relatively simple, just add na_rep="NaN" to the track.to_csv` call so that NaN values are printed. I will submit a fix for this bug shortly (edit: done at #1369).

In the meantime, I would suggest using ssh.txt as input to x2sys_cross for PyGMT v0.4.0. Alternatively, you may also consider doing df = df.dropna() to remove the NaN values in your lon/lat/ssh dataset, but that would not give you the crossover locations of points with missing values (which may be ok or not ok, depending on what you want to do with your dataset).

isLiYang commented 3 years ago

Thank you for your reply. I will try to remove NaN values in pandas.Dataframe When using x2sys module, deleting NaN values doesn't affect my calculations. Thank you for your contributions to open source software. @weiji14

weiji14 commented 3 years ago

Cool, and let us know if you need any other help, or if there's another x2sys module at https://docs.generic-mapping-tools.org/6.2/modules.html#x2sys you need and we'll put in a feature request :smile:

P.S. I'll reopen this issue until the bugfix at #1369 is incorporated to PyGMT's main branch.

isLiYang commented 3 years ago

@weiji14 😁 Haha. That sounds great, but I don't need any of the other functions of x2sys module right now. It is better to develop some commonly used functions first.