Closed weiji14 closed 3 years ago
Here is the definition of the GMT_Put_Matrix
function:
int GMT_Put_Matrix (void *API, struct GMT_MATRIX *M, unsigned int type, int pad, void *matrix)
The third parameter type is the data type of the matrix, e.g., GMT_DOUBLE
, GMT_FLOAT
. It also means that all elements of the matrix must have the exact same data type. Thus, in PyGMT, we can't pass 2D numpy arrays with mixed data types to put_matrix
function.
The fix seems easy. We may have to pass 2D arrays as a series of vectors, via virtualfile_from_vectors
.
Ping @weiji14.
The third parameter type is the data type of the matrix, e.g.,
GMT_DOUBLE
,GMT_FLOAT
. It also means that all elements of the matrix must have the exact same data type. Thus, in PyGMT, we can't pass 2D numpy arrays with mixed data types toput_matrix
function.The fix seems easy. We may have to pass 2D arrays as a series of vectors, via
virtualfile_from_vectors
.
Right, so we'll need to have something like an if-then or try-except to handle mixed dtypes. A couple of other details to consider:
put_vectors
for info
all the time (will involve a for-loop), or do we check if dtypes are mixed, then use put_vectors
, else use put_matrix
as per usual.Note that numpy.array
s always have the same dtype, it will just be np.object
if dtypes are mixed. pandas.DataFrame
s are the ones that can explicitly have different dtypes in different columns.
info
to handle/support other mixed dtype combinations (e.g. int32/float32/etc) properly, thinking about #547 here.I've got a unit test for this written up already and will submit a PR soon, just need to work out these implementation details :smile:.
Just following up on this, we've merged in #619 so if you install PyGMT from the master branch, passing in datetime
inputs won't result in "GMTCLibError: Failed to put matrix of type object." anymore. However, the datetime column's ranges will be reported in UNIX timestamps instead of ISO datetimes.
A workaround for this as mentioned at https://github.com/GenericMappingTools/gmt/issues/4241#issuecomment-695958278 is to use something like pygmt.info(table=df, f="1T")
, which would explicitly tell GMT that the second column is a datetime type, and should be handled that way.
We will close this issue once this upstream GMT issue at https://github.com/GenericMappingTools/gmt/issues/4241 is resolved, and perhaps when PyGMT bumps the minimum required version to GMT 6.2.0 and/or when conda GMT 6.2.0.dev builds are available with https://github.com/conda-forge/gmt-feedstock/pull/100.
A workaround for this as mentioned at GenericMappingTools/gmt#4241 (comment) is to use something like
pygmt.info(table=df, f="1T")
, which would explicitly tell GMT that the second column is a datetime type, and should be handled that way.
So the workaround doesn't quite work because of the way we've implemented things in #619 using np.loadtxt
:
import pandas as pd
import pygmt
table = pd.date_range(start="2010-01-01", end="2020-01-01")
pygmt.info(table=table, spacing="1Y", f="0T")
errors with:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-88-cac984c8d7d8> in <module>
----> 1 pygmt.info(table=df[[time_var, elev_var]], spacing=f"1W/{spacing}", f="0T")
~/miniconda3/envs/pygmt/src/pygmt/pygmt/helpers/decorators.py in new_module(*args, **kwargs)
268 if alias in kwargs:
269 kwargs[arg] = kwargs.pop(alias)
--> 270 return module_func(*args, **kwargs)
271
272 new_module.aliases = aliases
~/miniconda3/envs/pygmt/src/pygmt/pygmt/modules.py in info(table, **kwargs)
137 if result.startswith(("-R", "-T")): # e.g. -R0/1/2/3 or -T0/9/1
138 result = result[2:].replace("/", " ")
--> 139 result = np.loadtxt(result.splitlines())
140
141 return result
~/miniconda3/envs/pygmt/lib/python3.8/site-packages/numpy/lib/npyio.py in loadtxt(fname, dtype, comments, delimiter, converters, skiprows, usecols, unpack, ndmin, encoding, max_rows)
1137 # converting the data
1138 X = None
-> 1139 for x in read_data(_loadtxt_chunksize):
1140 if X is None:
1141 X = np.array(x, dtype)
~/miniconda3/envs/pygmt/lib/python3.8/site-packages/numpy/lib/npyio.py in read_data(chunk_size)
1065
1066 # Convert each value according to its column and store
-> 1067 items = [conv(val) for (conv, val) in zip(converters, vals)]
1068
1069 # Then pack it according to the dtype's nesting
~/miniconda3/envs/pygmt/lib/python3.8/site-packages/numpy/lib/npyio.py in <listcomp>(.0)
1065
1066 # Convert each value according to its column and store
-> 1067 items = [conv(val) for (conv, val) in zip(converters, vals)]
1068
1069 # Then pack it according to the dtype's nesting
~/miniconda3/envs/pygmt/lib/python3.8/site-packages/numpy/lib/npyio.py in floatconv(x)
761 if '0x' in x:
762 return float.fromhex(x)
--> 763 return float(x)
764
765 typ = dtype.type
ValueError: could not convert string to float: '2019-05-19T20:53:51'
np.loadtxt assumes that the text are to be read as floating point numbers, but datetimes like "2019-05-19T20:53:51" are not floats. We'll need to set the dtype using np.loadtxt(..., dtype=???)
, where ???
is "str,float"
or something (ref https://stackoverflow.com/a/31554777/6611055).
Alright, with #960 merged. Anyone installing PyGMT from the master branch (see https://www.pygmt.org/v0.3.0/install.html#using-pip) should be able to use the coltypes="0T"
GMT 6.1.1 workaround (where 0T
means the first column contains time), i.e.:
import pandas as pd
import pygmt
table = pd.date_range(start="2010-01-01", end="2020-01-01")
region = pygmt.info(table=table, spacing="1Y", coltypes="0T")
print(region)
# ['2010-01-01T00:00:00' '2020-01-01T00:00:00' '0' '0']
Assuming that https://github.com/GenericMappingTools/gmt/issues/4241 is resolved in GMT 6.2.0, then GMT 6.2.0 users won't need to use the coltypes
parameter in the future (saves people from needing to know what is the number of the time column).
FYI, https://github.com/GenericMappingTools/gmt/issues/4241 has been magically resolved, so this issue can be resolved when PyGMT bumps the minimum version to GMT 6.2.0!
Phew, thanks team, glad to close down another >6 month old issue!
Description of the problem
Just noticed that datetime columns being passed into
pygmt.info
doesn't work. This follows on from thepandas.DataFrame
inputs intopygmt.info
functionality added in #574, see also #464 and #562 where the datetime machinery should be more or less implemented.Full code that generated the error
Note that the equivalent
gmt
command does work on datetime inputs.Full error message
System information
Please paste the output of
python -c "import pygmt; pygmt.show_versions()"
: