GenericMappingTools / gmt

The Generic Mapping Tools
https://www.generic-mapping-tools.org
Other
864 stars 359 forks source link

Support periodicity for coordinates other than longitudes #4507

Closed PaulWessel closed 3 years ago

PaulWessel commented 3 years ago

Description of the desired feature

Note: This feature request was inspired by the experiences reported on the forum.

The only data type that GMT recognizes as periodic is longitude. Basically, we internally compute something like this:

longitude = (in_longitude - start) % period + start; (1)

where start is 0 and period = 360 (the -180/+180 vs 0/360 is a separate formatting setting). This periodicity is enabled when we specify a map projection or when -fg is used (or even just -fcolx).

There may be other types of periodic data, such as daily or annual time-series. Let's say you have a time-series of the run-off of a river (here Mississippi). You can make a time-plot:

gmt plot mississippi.txt -R1930T/1941T/0/50000 -B -W0.25p,red -JX6i/3i -png t1

t1

Now, if I want to plot this data so that the year does not matter, i.e., I want to see all these annual cycles (due to seasonal snow melt) on a single graph, I have to do something more complicated:

gmt math mississippi.txt --TIME_UNIT=d --TIME_EPOCH=1930-01-01T -fi0T -fo0t -C0 365 MOD = | gmt plot -R0/365/0/50000 -W0.25p,red -JX6i/3i -B -gx200 -png t2

where I basically convert the absolute times to relative time in days since the start of the data (1930), then run the modulo function with a period of 365 days, and then plot these data using a corresponding -R in days and a gap detector of 200 days so the wrapping, periodic lines do not connect back to January:

t2

One problem is simply brushed under the rug: Years vary in length and 365 days does not really work, but it is not bad and an exact solution probably would look almost identical and nobody could really tell - but it would not be correct.

I may want to make a histogram of these data to show the total (or average) runoff per month. I can do

gmt math mississippi.txt --TIME_UNIT=d --TIME_EPOCH=1930-01-01T -fi0T -fo0t -C0 365 MOD = | gmt histogram -T30.4166666667 -Z0+w -R0/365/0/1e7 -Gred -JX6i/3i -B -W1p -png t3

where I select 1/12 of a 365-day year as bin width and sum up the daily entries inside each "month". This gives me

t3

Here, of course, I sweep under the rug the fact that months are not 365/12 (not to mention leap-years and February) but who will know? Well, we know - it is not correct. Another cumbersome issue is that I would like to label my time-axis with months despite not being absolute time. The only way in GMT to do this is to set up a custom annotation file, say, called months.txt:

15.2083333333   a J
45.625  a F
76.0416666667   a M
106.458333333   a A
136.875 a M
167.291666667    a J
197.708333333    a J
228.125  a A
258.541666667    a S
288.958333333    a O
319.375  a N
349.791666667    a  D

Now I can run

gmt math mississippi.txt --TIME_UNIT=d --TIME_EPOCH=1930-01-01T -fi0T -fo0t -C0 365 MOD = | gmt histogram -T30.4166666667 -Z0+w -R0/365/0/1e7 -Gred -JX6i/3i -Bxcmonths.txt -By -W1p -png t4

and get something closer to what I want:

t4

Now, I suspect most GMT users will not be able to arrive at these steps without a lot of help and examples such as this. It is clearly not elegant, and for actual time data with variable years and months it is a bit tedious and not exact. Since we don't care about leap seconds, doing a daily version of this (say, a data set that recorded something every minute for multiple days) would be exact, and label things 0-24 may not be so bad.

There may be other types of data out there that could also be considered or be viewed as periodic with other periodicities and start values. For instance, the forum post above wanted to plot from May to April instead of January to December. If the data were considered periodic, like longitudes, a -R5/17 would still plot the correct locations as we would wrap accordingly. But not in the case above. Here I would need to use a nonzero start value in the equation and modify months.txt. Again, can be done but hardly simple.

One way this could be handled in GMT would be to allow us to specify a generic periodic column and supply the period and optional start value. The logical place to do this would be via -f, e.g., for the above case I would do

-f0T+p1y

notifying GMT that the first column contains absolute time but that I want to apply a period of 1 year. Then, like longitudes, the input data is transformed to relative time per (1). I am left to set -R, and like longitudes where -R range is limited to <= 360, I will here be limited to the period, be it 0/1 (in years) or 0/12 in months. Since GMT knows what you are doing we could automatically handle the labeling based on your increment choices, and we can also handle the exactness of binning in histogram by keeping all February data in the 2nd bin, for instance. For other kinds of periodic data there are no leap-concerns.

I opened this request to see if this is a worthy thing to add to GMT, if there are other periodic or pseudo-periodic data examples you can think of, and if you have other comments or ideas along these lines. Pinging @GenericMappingTools/core and @GenericMappingTools/gmt-contributors.

gd-a commented 3 years ago

I really don't know how things are done under the hood with GMT... but doing it in 2 steps on the user side seems reasonable :

The post-process :

Back to the original problem on the forum, instead of painstakingly deal with the periodicity in the calculation, I still believe a simple reshaping trick done in post-process could work for such rare cases.

Say I have a random matrix MAT and I want it starting at index 42 along x-axis, the tool could simply reshape the so-called matrix :

MAT_new = [MAT(42: end) , MAT(1:42 -1)] X_new = [(42 : end) , (1 : 42 -1)]

And we plot MAT_new = f(X_new) instead of MAT = f(X);

The process :

As for the periodicity itself, maybe splitting the signal into several ones is the simplest way to deal with ?

s = f(t) list = 0 % 365.25 [magic command -flag <- list] s_new(1) = 0 : 364.25 s_new(2) = 364.25 : 729.5 ... histogram bin 1 = ( s_new( : )[0:n]) histogram bin 2 = ( s_new( : )[n+1 : 2n]) ... where n can be "month length" and the "mean" for example. This module would read the signal as well as the user-defined period (here 365.25), and digest it to output the mean value at each user-defined bin across the sliced signal.

Then use the post-process function to shift the display.

gmt periodize signal.txt -T365.25 -SO -Gnew.txt gmt wrap new.txt -P5 -Gnew2.txt gmt plot new2.txt -Sb1O

signal = [y1 y2 ... yN] x = [x1 x2 ... xN] (or 1901-01-01 , 1901-01-02 , ... , 2020-12-31)

snew = [y_mean1 y_mean2 ... y_mean12] xnew = [1 2 ... 12] (or Jan, Feb, ..., Dec)

snew2 = [y_mean5 y_mean6 ... y_mean4] xnew2 = [5 6 ... 4] (or May, Jun, ..., Apr)

maxrjones commented 3 years ago

For ASCII files, I would expect that you could pretty easily use date in a shell script to add a column with julian day or month, which could then be binned by histogram. As MATLAB and Python support datetime manipulation, the sticking point where more functionality in GMT could be needed is if the data are binary and GMT is called through the command line/shell scripts.

I think it would be most useful to output julian day, month, or day of week indexes for time series. Is that what you had in mind or did you envision binned data output?

PaulWessel commented 3 years ago

yes maybe. I've decided against the idea of creating a general periodic coordinate. However, I will leave the issue open until I have a better plan.

joa-quim commented 3 years ago

This is what would allow to reproduce the 4th example/figure in Loren's blog, the only one I could not reproduce with (more) elegant GMT.jl calls.

joa-quim commented 3 years ago

-T could be expanded to accept startDateTime/endDateTime/interval and then for plotting it would chunk the input data and plot them. For histograms, it would consider it the binning vector.

PaulWessel commented 3 years ago

Well, from the externals this should be easy - you could just mimic their month function. For command-line use the options are much more limiting, hence the proposal above. However, I cannot think of cases other than time-series where this is required, so perhaps instead of my generic solution above we could think of something more specific to time. Thinking out loud:

Complement -fcolT (which sets absolute time input for the listed column) with -fcolO , -fcolJ , -fcolD , -fcolH , -fcolM and -fcolS to reduce the input absolute time coordinates to just return fractional month, Julian day, day, hour, minute or second instead (may be OK with lower case - did not check yet).

This means input coordinate yyyy-mm-ddThh:mm:ss.xxx would not be converted to internal seconds since epoch (like UNIX 1970 time) but instead result in the new input coordinates for Cartesian x consideration:

0-11.xxxxxx (fractional month) 1-365.xxxxxx (fractional JD) 0-30.xxxxxx (fractional day month) 0-23.xxxxx (fractional hour) 0-59.xxxx (fractional minute) 0-0.xxxxxx (fractional second)

Not having thought much about this as I write it, I like the simplicity. It simply processes the coordinates accordingly, so for making a monthly histogram it would be

gmt histogram mississippii.txt -R0/12/0/1e7 -f0O -W1o -Gred ...

joa-quim commented 3 years ago

Well, from the externals this should be easy - you could just mimic their month function.

Nope. month is a non-documented function and since Julia started to build pressure on speed they moved lots of things to compiled code. Add to that the OO shit and Matlab has become nearly useless to learn how algoritms are implemented.

PaulWessel commented 3 years ago

Well, I have worked out on paper how my above suggestion would be implemented deep in the code. I don't think it is a bad plan - see any issues? It would deal with things like month having different lengths so -f0O would put any leap-hear Feb 29 data into the second month, although I am not sure how one statistically would prefer to deal with, say, a 50 year sequence with 12-13 leap years and having February be period that is 28-29 days long.

joa-quim commented 3 years ago

Went for a walk outside (we are locked down at home but I didn't leave to outside the fence) and it smelled good. It implies making a data copy but vector data is never really big. It's just that -f that, other than -fg looks complicated but softer layers for it will come.

PaulWessel commented 3 years ago

These are the likely periodicities of interest in science: 1 day, 1 month, and 1 year. Anything else the user can deal with via a it of scripting. Since week 7 is not the same across years I think anyone wanting "weekly" averages can use 7 days as a bin.

day: Simply return the fmod (time, 86400) so we get just the seconds in a day. If you want hours, then use 3600 as your bin. month: Return floating point month 0-11.xxxxx with actual month separations. year: Return floating point Julian days 0-365.xxxxx

Could be d, m and y as the indicator and users can set -R0/86400, -R0/12, or 0/366 as x-range.

PaulWessel commented 3 years ago

Hm, it also needs to be flagged as periodic though. If you want to plot from June through May then we need to wrap around.

PaulWessel commented 3 years ago

I will think of a more complete proposal on this that is restricted to known time periodicities (daily, monthly, yearly) and how it should be implemented.

maxrjones commented 3 years ago

I agree that daily, monthly, yearly periodicities are a reasonable starting place. Eventually, it would be nice to include a weekly option, which is relevant for many types of data related to people (e.g., air-quality and emissions, traffic, COVID cases). Not the most common use-cases for GMT, but still important.

PaulWessel commented 3 years ago

OK, I agree, no point leaving out a relevant periodicity. I will include those 4 in a proposal.

PaulWessel commented 3 years ago

What you think: Shall I open a new issue with a more suitable title and refer to this one (after closing) for some details and the plots, then outline the proposal for time there?

joa-quim commented 3 years ago

Perhaps better open a new one. This has already already much talk

WalterHFSmith commented 3 years ago

I don’t know what machinery for time series we still have under the hood. Back in the day when I first coded a bunch of stuff working from a book called Calendrical Calculations (if I remember) I included machinery for day of the week and week of the year. European agencies I work with schedule meetings with reference to the ISO week of the year.

If this machinery is still there it should be easy to make things periodic with period of a week. In UTC time it is obviously periodic, and almost always so in TAI, except when we have a leap second to contend with, and those are rare. The machinery I had could tell you what day of the week a time value fell in, so you could determine one of your period boundaries and then the periodic extension would be (almost) trivial (apart from those pesky leap seconds).

Assigning a week day is trivial if we know one (such as the day on which the UNIX seconds since 1970 was zero). The initial assignment isn’t so trivial because the number of weeks in 400 years of Gregorian-formula leap years is 52x400 plus 71; 71 is prime, and those 71/400ths of a week fall in odd spots.

W

On Jan 25, 2021, at 6:43 PM, Meghan Jones notifications@github.com wrote:

I agree that daily, monthly, yearly periodicities are a reasonable starting place. Eventually, it would be nice to include a weekly option, which is relevant for many types of data related to people (e.g., air-quality and emissions, traffic, COVID cases). Not the most common use-cases for GMT, but still important.

— You are receiving this because you are on a team that was mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

PaulWessel commented 3 years ago

Thanks Walter, yes all that stuff is there and I plan to use it as described. However, I think for people doing weeks we will just do weekdays 1-7 so that all Monday data ends up in the Monday x-value range, just like all February data, including the 29th, will land in the Feb month range 1-1.999999.

WalterHFSmith commented 3 years ago

right. glad to know it is still there. and a leap second belongs to the day which it ends; it doesn’t begin a new one.

On Jan 25, 2021, at 7:56 PM, Paul Wessel notifications@github.com wrote:

Thanks Walter, yes all that stuff is there and I plan to use it as described. However, I think for people doing weeks we will just do weekdays 1-7 so that all Monday data ends up in the Monday x-value range, just like all February data, including the 29th, will land in the Feb month range 1-1.999999.

— You are receiving this because you are on a team that was mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

maxrjones commented 3 years ago

Feature implemented in #4746