Closed seisman closed 4 years ago
Agreed. Pandas can already handle this type of data. I confess that I've never had this use case so I'd love to hear what people need from this.
I would love to see this feature as well in pygmt. GMT has the best support for easy-to-read and beautiful time series plots. I use that feature even more than plotting maps. I especially like the interval annotation (where, for example, the name of the month is shown in between two tick marks, instead of underneath the first day of the month). Would love to be able to do all of this from pygmt.
The basemap features with the nice annotation already work, but plot (psxy) does not yet accept datetime data.
There are many date/time representations in Python. Plain-old command-line GMT uses ISO strings "2019-03-18T17:48:00.000", which would be a good place to start, but native datetime module objects and numpy.datetime64 would be nice as well. Otherwise pandas can easily help to do any conversions.
@PaulWessel What's the correct way to pass string vectors to GMT? Like this one:
2008-01-01T00:00 5.0
2008-01-01T00:01 5.0
The GMT_Put_Vector
function can only pass numeric vectors. It seems GMT_Put_Strings
can pass string vectors, but it's not clear to me how to specify the column number for string vectors.
Did not anticipate this. One way would be to convert those to UNIX seconds since 1970 and pass that as a double. However, if you want to pass datetime strings then we would need to pass type = GMT_DATETIME and that would need to trigger a conversion from those strings to internal time. Would you also need to reverse, i.e., calling GMT_Get_Vector and if that column is abs time then return string array? Or perhaps we could make a GMT_DateTime () function that takes your string and returns a double for internal floating point time and then you pass that? What would be best from your perspective?
Since telling GMT that you have abstime etc is done via -J or -f, that is separate from loading up an array in a column. So maybe for me to extend Get/Put to do the datestring to time and back is the simplest?
Before we do, what is the equivalent issue in Julia and MATLAB, @joa-quim ? Regarding time representations.
It's not only about datetimes. See this command-line example:
gmt begin map png,pdf
gmt basemap -R0/10/0/6 -JX10c/6c -Bafg1 -BWSen
gmt text -F+f+a+j -W1p -Glightblue << EOF
5 1 12p,0,red 0 TL GMT TEXT1
5 3 15p,1,blue 30 MC GMT TEXT2
5 5 18p,2,yellow 180 TL GMT TEXT3
EOF
gmt end show
It seems there are no API functions to pass the third columns (varying fonts) to GMT.
Sure? All that stuff is part of trailing text, i.e.,
5 1 is the two numerical columns and "12p,0,red 0 TL GMT TEXT1" is the trailing text. That is what GMT_Put_Strings is meant to do. Trailing text is its own special "column"
IN contrast, your datetime string is meant to be a numerical column but it is given as a string that needs conversion.
I like the GMT_Put/Get_Vector extension to GMT_DATETIME. With no interruptions (...) I would make a branch for that today.
5 1 is the two numerical columns and "12p,0,red 0 TL GMT TEXT1" is the trailing text. That is what GMT_Put_Strings is meant to do. Trailing text is its own special "column"
I thought the 4th column (the text angles) should be passed as a numerical column.
I like the GMT_Put/Get_Vector extension to GMT_DATETIME. With no interruptions (...) I would make a branch for that today.
Yes, I think that's a good and useful extension.
Before we do, what is the equivalent issue in Julia and MATLAB, @joa-quim ? Regarding time representations.
Don't know. Never tried to plot with time
pstext goes way back. The order of things were fixed and it mixed up text and numbers. Then we added the -F+j+a+f modifiers to specify the order, and even pull some of them out (+a45+jCB). IT is one of those things were backwards compatibility is a problem. Since angle is a number then it should always be part of the numerics (if it is given). I think from the externals you need to place Put_Strings with whatever order you have and let -F tell us.
@joa-quim : I am sure Julia must have a way to deal with time, but there really are only a few: a string like @seisman has, a floating point number (time-units since some epoch), or some complicated structure with hours, day, month, etc). I will work up a string-solution.
In PR https://github.com/GenericMappingTools/gmt/pull/3396, the GMT API function GMT_Put_Vectors
added support for vectors in GMT_DATETIME type. The updated function will be available in the upcoming GMT release GMT 6.1.0.
With that PR merged, now it's possible for us to pass datetime vectors to GMT. See #464 for the possible implementation.
Comments and suggestions are welcomed.
Just had a quick look at #464 and it looks really promising! Been working with some time series data recently so I'd be keen to test it out once GMT 6.1.0 is out (looking at the calendar team!).
One quick question though, how will Not-a-Time (i.e. NaT
) values be handled? Does GMT just ignore plotting them (as it should) or would it convert it to some big number like numpy seems to currently do at https://github.com/numpy/numpy/issues/16391? Or would it just fall back on the user to properly drop those NaT values first before plotting them with PyGMT/GMT.
Any datetime types (raw strings, datetime, np.datetim64 and pandas.DatetimeIndex) are converted to strings (char **
in C) before passing to the GMT API.
NaT
is converted to the string NaT
. GMT can't handle NaT, so it just gives an error and skips the NaT data points. (Perhaps it should be a warning instead of an error).
pygmt-session [ERROR]: Unable to parse 1 datetime strings (ISO datetime format required)
What is the definition of a NaT?
NaT is also new to me. There are some documentation from Matlab and Python numpy
NaT is basically like NaN, but for time. To be honest, I'm not sure why they don't just use NaN, but yeah, it exists...
NaT
is converted to the stringNaT
. GMT can't handle NaT, so it just gives an error and skips the NaT data points. (Perhaps it should be a warning instead of an error).pygmt-session [ERROR]: Unable to parse 1 datetime strings (ISO datetime format required)
Yeah, a warning might be better here. Not sure if there is an ISO datetime for NaT.
Revisiting this thread a little as I'm trying to implement support for Apache Arrow's date32 and date64 dtypes in PyGMT at #2845, and would like some advice on the implementation.
Currently at 5d16103de4eacba6c92f7f0489744b7015a30a8a, I've converted data stored in PyArrow's date32
dtype (32-bit) to NumPy's datetime64
dtype (64-bit), which might cause an increase the memory usage going from 32-bit to 64-bit. But @seisman mentioned at https://github.com/GenericMappingTools/pygmt/issues/242#issuecomment-636269779 that GMT_Put_Vectors
converts the date to a string/char representation? Would it be better to just convert the PyArrow date32 data directly to a string/char representation then, instead of going through an intermediate np.datetime64
format? Or does it not matter since string/char is 64-bit anyway.
Also, what's the temporal resolution of GMT_DATETIME
, or the smallest unit that can be handled? Asking because np.datetime64
supports different units (https://numpy.org/doc/1.26/reference/arrays.datetime.html#datetime-units) such as nanosecond, millisecond, day, etc, and it'd be good to know how fine or coarse of a resolution we can handle.
Internally in GMT, absolute (and relative) time is stored in doubles. From wikipedia it say the smallest increment between two doubles is 2.22e-16 and since the default unit is seconds since some epoch (e.g. UNIX time 1970) it would seem that is pretty small. We never really used high-precision time when we wrote GMT so I would think 2.22e-16 seconds is pretty small. But not sure if that is always the smallest increment. Should cover nano seconds.
It's a little more elaborated. The 2.22e-16
is the eps for doubles around 1
julia> eps(1.)
2.220446049250313e-16
but since we are already 53 years away from 1970, the eps now is much larger
julia> eps(53*365*24*60*60.0)
2.384185791015625e-7
If the time reference is the year 0 (not uncommon) with doubles one cannot have better resolution then ~10 micro sec
julia> eps(2023*365*24*60*60.0)
7.62939453125e-6
And definitely doubles must be used. The panorama with singles (float32) is dramatic
julia> eps(Float32(53*365*24*60*60.0))
128.0f0
Ideally time should be stored in 64 bits ints.
Ok, so if I understand correctly, the default TIME_UNIT
is 1 second, but because GMT stores time as double (float64), the smallest resolution would vary away from the set TIME_EPOCH
, and this means the time-steps could range in the order of 10e-16 second to 10e-6 second or so?
To be safe and on the conservative side, it sounds like 1 microsecond is ok (and I just remembered @seisman mentioning this at https://github.com/GenericMappingTools/pygmt/pull/464#issuecomment-636585247)...
Yes, but when reference is year zero 7.62939453125e-6
~= 1e-5
, which gives ~10 microseconds
GMT_Put_Vectors
converts the date to a string/char representation? Would it be better to just convert the PyArrow date32 data directly to a string/char representation then, instead of going through an intermediatenp.datetime64
format? Or does it not matter since string/char is 64-bit anyway.
Just read the GMT_Put_Vector
source codes again. It seems we can convert datetimes to double and pass the double vector to GMT C API, but we can't tell GMT the column is GMT_DATETIME type.
Yes, but when reference is year zero
7.62939453125e-6
~=1e-5
, which gives ~10 microseconds
Well, who would use year zero and look for high precision in recent years. You can decide you internal epoch if you really need nanoseconds around some epoch. Presumably one does not need nanoseconds from 0-2023?
Description of the problem I want to plot some datetime data on a map. However, gmt-python doesn't accept string as input.
Full code that generated the error
Full error message
System information
conda list
below:output of conda list