GenericMappingTools / gmt

The Generic Mapping Tools
https://www.generic-mapping-tools.org
Other
844 stars 353 forks source link

Add support for periodic time intervals #4706

Closed PaulWessel closed 3 years ago

PaulWessel commented 3 years ago

Description of the desired feature

I propose we add in support to convert absolute time coordinates into periodic time related to daily, weekly, monthly, and annual cycles. See #4507 for the initial impetus, including plots demonstrating the use and need for this feature, even though that proposal was ultimately too generic (create custom periodicities).

Here are the principles I propose to follow:

  1. The implementation will be added to the lowest-level i/o processing since input may come from ASCII files, native binary files, netCDF table files, or virtual files. They all share those low-level functions so that is where the conversion must take place. Any other solution will duplicate efforts and open up possibilities for bugs.
  2. I believe the most suitable way of informing GMT of periodic time is via the -f option, since it is used to set column type, and (except when -J indicates absolute time) is needed to properly read files and interpret virtual tables. The -f currently has the format -fcol? where current column type flags ? are x (longitude), y (latitude), T (absolute ASCII time), t (relative, binary time), p[unit] (projected coordinate, append unit if not meter), or f (floating point). There is also the two shorthands -fg (for -f0x,1y) and -fc (-f0-1f) [actually, I think p is like that too, including two columns for map projected data]. Given these letters, and because the periodicity applies equally to absolute and relative times + epoch (since the latter is the only non-ASCII way of feeding GMT), I propose we use -fT|t[y|m|w|d] which is simple to parse.
  3. The four cycles will lead to a conversion from what is initially seconds relative to the epoch to new repeating Cartesian coordinates as follows:
  1. Like the behavior of longitude with a Cartesian scaling told to deal with longitude (-Jx1cd/0.4c for example), the periodic coordinates need to wrap around given the prevailing -R selection. For example, -R-6/6/.... would plot June through June and month = 3.44 needs to wrap and plot where -2.56 is on that axis.

The only non-trivial items are things like leap-years. We will place any Feb 29 in the 1-1.99999 month range, and for similar reasons I chose 0-1 coordinates for a year since Julian day would add the problem of 365 vs 366. Let me know if you disagree or know how other systems handle these uneven periods.

Finally, the user should be able to affect the units used. I am open for your suggestions. The options are (I think):

  1. Let users set TIME_UNIT and report everything in those units. While this may work well for the daily cycle (TIME_UNIT=hour could be useful) it does not see as useful to change the weekly and monthly to anything but days and month (since variable), and annual in seconds is a bit large and odd.
  2. Let users scale their time column via -i. So if making a dailyl plot and you dont like to have to use -R0/86400/... you could do -i0+d1440 (I would add +d for division as a compliment to +s scale since it is so awkward to write what 1/3600 etc is as a factor) and you can do -R0/24/... instead.

I think option 2 is better and more flexible. Open to comments from @GenericMappingTools/core.

joa-quim commented 3 years ago

Regarding the periods. It seems a bit awkward to me that I have to specify -fTd and then days are expressed in seconds -R0/86400/.... Shouldn't it be 0..1 like the years?

KristofKoch commented 3 years ago

May I suggest to have

added as well?

PaulWessel commented 3 years ago

Thanks @KristofKoch . Yes, those are useful periodicities, but I guess I am thinking if you want to make a bar graph of hourly data you select the daily period and use a bin width of 3600 seconds, or any other sub-day interval. Likewise for quarterly, in histogram that would be -W3 and off you go. So I see the hour, or minute, or 2-hour chunks as all subsets of daily and easily dealt with via binning, and the quarterly as a subset of monthly. And of course average crop production during 5 -years would work well as a subset of annual (-W5). While I agree that issues like energy use during the day are useful, if you just want to see what peole used during 13:00-14:00 then you just plot that subregion. Meaning, I do not think an hourly period (reducing all data to 0-3599.xxxx seconds) is useful.

If you actually want data output that has the numbers 0-3.xxxxxxx for which quadrant then I think you just scale the monthly 0-11.xxxxx by 0.25 and you have it.

In that context, I think daily (with many subsets), weekly (which requires knowing specific week days; and probably not simple subgroups other than workdays and weekends which can be teased out easily), months (because of variable lengths) and year (same). I think those are the 4 we need to implement.

KristofKoch commented 3 years ago

Hi @PaulWessel thank you for your explaining the binning – I agree with your point.

PaulWessel commented 3 years ago

I actually have a branch that is about to do most of what is discussed above. However, a quick question: I am not sure if my 4 cycles (daily, weekly, monthly, annually) all make sense.

PaulWessel commented 3 years ago

Just to follow up on monthly: Obviously, if I just stacked data on day-of-the-month then the days 29, 30, and 31 would be anomalous w.r.t. data for days 1-29 due to the variable month lengths. So what use is a day-in-a-month dataset enabled by me allowing a monthly cycle? And if I normalized each month by the length of that month so all times converted to 0-1, is that useful at all?

gd-a commented 3 years ago

If I want to see if the micro-seismicity of a volcano is linked to its hydrothermal system, I might want to compare rainfall and quakes seasonality (e.g. More rain in February than in October) ... Is it possible to have the count depending on the calendar day? (stack 31 for January / 28..29 for February / ... ) ?

For the example above, I'm interested in the total amount in the month (which I know to be 31, 28 or whatever days long). If I need, say a monthly mean, I can always divide by the number of days. However trying the reverse process is difficult (I can't have to total amount within the month if some data have been removed/ignored/smoothed).

I don't know if I'm very clear here...

WalterHFSmith commented 3 years ago

I am trying to follow this discussion and I think you and Paul are talking about two different things (? - or you have succeeded in cofusing me, but that is easy to do).

What I think Paul is talking about is taking data which are given with a coordinate, say, t, and converting that coordinate to a new one, x = [(t-t_0) modulo P]/P where P is the length of a t interval over which we might expect that the data shows some periodicity. (Perhaps in Paul’s proposal t_0 is not under user-control; I am just trying to capture the general sense here.) So x is in the interval [0, 1).

What you describe, comparing seasonality in two data sets, might be helped by the above, but I am not clear about your count depending on the calendar day. Could you be more concrete about what you imagine is the data that are available and how you would want to crunch them?

If you were looking for seasonality, then in the above formula P would be one year, and you could plot occurrences of quakes and of rainfall versus x.

If the data you have are true/false: an event did/didn’t happen, or perhaps an added quantitative measure when it did happen (e.g. the accumulated rainfall in a 24-hour interval is either zero or it is a positive quantity), maybe the data are given only on days/times when the event occurs (like quakes), and so maybe you want to first do a kind of running average, and then perhaps the comparison in x instead of in t.

More specifically, the quake data might be t1 [date and time of quake 1 in your list] t2 [date and time of quake 2 in your list] t3 [date and time of quake 3 in your list] or t1 e1 t2 e2 t3 e3 where the e values are, say, energies released in each quake.

You could then use a GMT tool like “filter1d” to produce a filtered data set with, say, the mean energy release (or mean occurrence rate of quakes, by printing the t value followed by a “1” on each input record) per some fixed time interval, say, 30 days. I say fixed time interval because it doesn’t make sense to process the data into “how many events occur per month” without also accounting for “how many days are in a month”.

Anyway, once you have done this first step you have m(t), the mean energy release, or mean number of quakes per day, etc., as a time series in t. You can now take m(t) and convert it to m(x) according to Paul’s proposal, but using P as a year, not a month, and then you can plot m(x) to see if the seasonality in quakes looks like the seasonality in rainfall.

Am I understanding what you want to do?

Perhaps you meant that your data are available only as running sums over months, so that, for example, you have N(t) or E(t), where N is the total number of quakes that occurred in a calendar month, or E(t) is the total seismic energy release that occurred in a calendar month, and R(t) is the total rainfall that accumulated in a calendar month, and t references the month somehow (it is the 1st day of the month, or the last, or something else).

If this is the kind of data you have then you would first need to normalize N,E,R for the fact that each month is a different number of days of accumulation. In other words you need to create m(t) = R(t)/K(t), where K(t) is the number of days in month t.

But the seasonal analysis would still proceed by plotting or crunching m(x) derived from m(t).

Or so it looks to me.

I hope I haven’t totally failed to understand you, or that this long email may enable us all to understand better what you want GMT to be able to do.

W

Walter H F Smith Geophysicist, Laboratory for Satellite Altimetry NOAA NCWCP code E/RA-31 5830 University Research Court, room 3752 College Park MD 20740-3818 301-683-3377 (voice, my desk) 301-683-3300 (voice, reception) 301-683-3301 (fax, reception) Walter.HF.Smith@noaa.gov http://www.star.nesdis.noaa.gov/star/Smith_WHF.php

On Jan 27, 2021, at 12:08 AM, GD-A notifications@github.com wrote:

If I want to see if the micro-seismicity of a volcano is linked to its hydrothermal system, I might want to compare rainfall and quakes seasonality (e.g. More rain in February than in October) ... Is it possible to have the count depending on the calendar day? (stack 31 for January / 28..29 for February / ... ) ?

For the example above, I'm interested in the total amount in the month (which I know to be 31, 28 or whatever days long). If I need, say a monthly mean, I can always divide by the number of days. However trying the reverse process is difficult (I can't have to total amount within the month if some data have been removed/ignored/smoothed).

I don't know if I'm very clear here...

— You are receiving this because you are on a team that was mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

WalterHFSmith commented 3 years ago

Paul,

A couple of subtleties came up for me when you said a year is 365.25 days.

These may not impact your suggestion. Maybe they should be noted in a cookbook or tutorial on how to analyze data with variations on multiple time scales.

The first subtlety is: what should one do with data that have both diurnal and annual (and perhaps other) variations?

To keep this concrete, suppose the data set is the temperature in my back yard, sampled every 6 hours or more frequently, over many years. If I segment the sample sequence into chunks that are 365.25 days long, each new chunk will have its origin 6 hours later than the chunk before it, and this will smear the daily variations.

The solution to this problem is probably to filter the data somehow, as in my reply to “GD-A”, so that the attempt to model or illustrate annual variations is applied to the running mean temperature taking means over 24 hours, or looking at the highest or lowest temp in every 24 hours, or something like that.

If one is fitting periodicities, the fact that the 365.25-segmented data have x-value origins at x in [0, 6, 12, 18]/(24*365.25), rather than all being at x=0, is probably going to have negligible effect on the fitted phase. Most data sets in practice won’t be able to resolve the details of the phase to within a few hours in a year.

I think we are talking about deciding where to plot data, and for any practical plot, the eye won’t be able to see these subtle shifts of phase. So this may be irrelevant to plotting.

But if we were trying to reconstruct a data set from its Fourier coefficients then it would be important to do this correctly. (E.g. code to build a grid from spherical harmonic coefficients has to take note of whether we have a grid- or pixel-registered grid.)

The second subtlety may be merely useless pedantry; I’m not sure. It is this:

If we use the Gregorian calendar rule for reckoning leap years, the average number of days per year, averaging from 1901 through 2099, is 365.25, but if we average over any interval of 400 years then it is 365.2425. The difference in these two quantities is 3 days in 400 years, probably a trivial detail in precision. Most data sets won’t be long enough or accurate enough to distinguish whether the period is 365400 + 100 or 365400 + 97 days. These problems didn’t bother my work because I haven’t analyzed data sets (except for sunspots and magnetic observatory data) that extend to year 1900 and earlier. That is, over the duration of the data sets to which I have had to fit annual cycles, the period has been 365.25 days long.

The fussy way to deal with leap years is the same as the way to deal with months, and that is to take a calendar and clock point in time, t, and break it into k and y, where k indexes the interval into which it falls, that is, t[k] <= t < t[k+1], and y = t-t[k]. Then we can map t to x using x = (t-t[k])/(t[k+1]-t[k]). This places x in [0,1) such that x is the position within the interval, relative to the duration of the interval, and allows for intervals that are 28,29,30,31,365,366 days long.

I have probably not added any value here. Sorry. If there is a germ of an idea here that needs thinking about, you will be able to pull it out of my muddled words, as you always do so well.

W

On Jan 26, 2021, at 11:23 PM, Paul Wessel notifications@github.com wrote:

I actually have a branch that is about to do most of what is discussed above. However, a quick question: I am not sure if my 4 cycles (daily, weekly, monthly, annually) all make sense.

• Daily is clear: Basically strip off the date and only deal with the hours/minutes in that day; a fractional day. So all data ends up with coordinates 0-1 for a single day, and we can plot energy use during the day, for instance, using many days of data. Unfortunately, GMT never implemented leap seconds so that remains just a concept. • Week is also clear: Here we want to make sure that data for Thursday all ends up with the same coordinates so that a histogram could show if Thursday is busier at Starbucks than Wednesday, for instance, given data on number of customers vs time. • Annual cycles means strip off the specific year, so to speak, so that data from different years can be plotted on a single year axis. Useful to plot that Mississippi discharge data to notice the variation in seasonal signals. I guess that single year is 365.25 days? • That leaves monthly: To follow the recipes above, that would mean converting data to a single month, e.g., the converted coordinates would all be 0-1 (or 0-30 if you want days) for one single month. This would allow stacking of data for the 21st of each month across numerous months. Is that useful for us? Please give a useful example. And how do we treat February vs January when both data sets will be merged into a single uniform month? Or does dates in February never get a coordinate beyond 29/31? — You are receiving this because you are on a team that was mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

PaulWessel commented 3 years ago

To keep this concrete, suppose the data set is the temperature in my back yard, sampled every 6 hours or more frequently, over many years. If I segment the sample sequence into chunks that are 365.25 days long, each new chunk will have its origin 6 hours later than the chunk before it, and this will smear the daily variations.

Data from each year is reduced to a time of 0-1 via _time_since_start_of_year/lengthof year. So both regular and leap years will be normalized to 0-1 output. If I am to plot that data, I can use normalized year 0-1 as the x-axis or I can scale that to 365.25. The 365.25 is not used anywhere to compute those coordinates though.

WalterHFSmith commented 3 years ago

OK that fixes that thanks.

On Jan 27, 2021, at 2:00 PM, Paul Wessel notifications@github.com wrote:

To keep this concrete, suppose the data set is the temperature in my back yard, sampled every 6 hours or more frequently, over many years. If I segment the sample sequence into chunks that are 365.25 days long, each new chunk will have its origin 6 hours later than the chunk before it, and this will smear the daily variations.

Data from each year is reduced to a time of 0-1 via time_since_start_of_year/length_of year. So both regular and leap years will be normalized to 0-1 output. If I am to plot that data, I can use normalized year 0-1 as the x-axis or I can scale that to 365.25. The 365.25 is not used anywhere to compute those coordinates though.

— You are receiving this because you are on a team that was mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

PaulWessel commented 3 years ago

@WalterHFSmith, any thoughts on the usefulness of a monthly cycle, i.e., reducing all data down to a generic month of length 1? SHould I compute time_since_start_of_month/length_of_month as for annual or might users expect the 14th day of each month to get the same coordinate, meaning we must use a fixed length of 31. And @GD-A, I get the case of a correlation between rain and earthquakes, but presumably rain is not something useful to plot on a generic month that stacks all months into one. I.e., that should if there is more rain on day 13 than day 19, which does not sound sensible?

gd-a commented 3 years ago

Could you be more concrete about what you imagine is the data that are available and how you would want to crunch them?

Sure. Actually I think we're talking about the same thing, but with 2 different approach. I think you summarised it pretty well.

Still here's the example I was working on when the first topic was created :

Screenshot 2021-01-27 at 15 11 00

I wanted to center the plot on boreal winter season (ie start the x-axis in June for example). The work around would be to use gmt select from the pre-processed data to plot from June to December then append January to April ... and change the labels in x-axis manually to be letters instead of number:

cat << EOF > xannots.txt
5.5 ig J
6.5 ig J
7.5 ig A
8.5 ig S
9.5 ig O
10.5 ig N
11.5 ig D
12.5 ig J
13.5 ig F
14.5 ig M
15.5 ig A
16.5 ig M
17.5 ig
EOF

# Trick to wrap the axis (periodicity like)
gmt select data?y/event_cnt -R1/12/0/100 >  output.txt
gmt math output.txt -C0 12 ADD = output.txt
gmt select $data?y/event_cnt -R6/12/0/100 >> output.txt

gmt plot [...] -Sb1u [...]

About the data :

These are cumulated counts of event. Over the whole period (several years), I summed the number of events happening in January (1973-01-07, 1973-01-23, 2003-01-31) which gives J = 3 in this example (not related to the plot).

PaulWessel commented 3 years ago

Yes sure, that will be covered. That is an annual cycle displayed per month and the data will first be reduced to fractional months and you can use histogram to plot those 12 actual months. My concern is if there is any point to a monthly cycle. I see needs for annual, daily, and weekly, but cannot make sense of monthly.

gd-a commented 3 years ago

Only for composite analysis I guess...

(I still think that a user-defined periodicity should be an option) ...https://www.mathworks.com/matlabcentral/fileexchange/46735-doodson-tidal-wave-components

PaulWessel commented 3 years ago

I think there is a trade-off between the amount of coding required to allow a user-defined period vs covering 99% of the needs via the temporal periods annual, weekly, daily, and possibly monthly. For monthly, would you expect Jan 17 to give the same periodic coordinate as Feb 17, both being x = (17-1)/31, or should they give different periodic coordinates of (17-1)/31 and (17-1)/28|9?

gd-a commented 3 years ago

If I rephrase, are we interested in a duration since the 1st of the month (julian day) or a relative position over a month ? For the "monthly" per se, I would tend to expect the Julian day approach so it would be easier to compare side by side two months... and February would always shorter than August.

If it's a user-defined period (like a lunar cycle or whatever), I guess the relative position relative to the phase of the tidal wave is more sensible.

IMO the only difficulties are monthly and yearly... (the picture I uploaded herebefore), all the other "regular length" should be user defined (with a tag -s, -m , -h, -d, ...)

WalterHFSmith commented 3 years ago

I don’t know anything that is periodic in months, regardless of whether those have 28-31 days. Things that are periodic have fixed periods. I cannot see wanting to map the duration of a calendar month into [0,1) regardless of how many days are in the month.

I can imagine that someone might be working in pseudo-months, like x runs from 0 to 1 means from one new moon to the next new moon. But we can’t anticipate every crazy thing someone might do.

w

On Jan 27, 2021, at 3:30 PM, Paul Wessel notifications@github.com wrote:

@WalterHFSmith, any thoughts on the usefulness of a monthly cycle, i.e., reducing all data down to a generic month of length 1? SHould I compute time_since_start_of_month/length_of_month as for annual or might users expect the 14th day of each month to get the same coordinate, meaning we must use a fixed length of 31. And @GD-A, I get the case of a correlation between rain and earthquakes, but presumably rain is not something useful to plot on a generic month that stacks all months into one. I.e., that should if there is more rain on day 13 than day 19, which does not sound sensible?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

PaulWessel commented 3 years ago

Period may be a bad word here since making histograms of stuff per calendar month is using a variable period of 28-31 days in order to bin the data. Perhaps it is useful to post the current algorithm as I have it right now:

void gmtlib_modulo_time_calculator (struct GMT_CTRL *GMT, double *val) {
    /* Only called if any of the -f<col>y|m|w|d statements were given to yield periodic data.
     * *val holds time in seconds since the epoch (e..g, UNIX time since 1970). */
    int L;
    struct GMT_GCAL cal;
    switch (GMT->current.io.cycle_time_operator) {
        case GMT_PERIODIC_DAY:  /* Return 0.000-0.999999 day */
            *val = fmod (*val, GMT_DAY2SEC_F) * GMT_SEC2DAY;    /* Yields 0.000-0.999999 day */
            break;
        case GMT_PERIODIC_WEEK: /* Return 0.00000-6.9999999 days */
            gmt_gcal_from_dt (GMT, *val, &cal);
            *val = (GMT->current.setting.time_week_start) ? (7 + cal.day_w - GMT->current.setting.time_week_start) % 7 : cal.day_w;
            *val += cal.hour * GMT_HR2DAY + cal.min * GMT_MIN2DAY + cal.sec * GMT_SEC2DAY;
            break;
        case GMT_PERIODIC_MONTH:    /* Return 0.000000-11.999999 months */
            gmt_gcal_from_dt (GMT, *val, &cal);
            L = gmtlib_gmonth_length (cal.year, cal.month); /* Days in this month */
            *val = cal.month - 1 + (cal.day_m - 1 + cal.hour * GMT_HR2DAY + cal.min * GMT_MIN2DAY + cal.sec * GMT_SEC2DAY) / L;
            break;
        case GMT_PERIODIC_YEAR: /* Return 0.00000-0.99999999 years */
            gmt_gcal_from_dt (GMT, *val, &cal);
            L = gmtlib_is_gleap (cal.year) ? 366 : 365; /* Length of this year in days */
            *val = (cal.day_y - 1 + cal.hour * GMT_HR2DAY + cal.min * GMT_MIN2DAY + cal.sec * GMT_SEC2DAY) / L;
            break;
    }
}
  1. As you can see, with no leap seconds, the DAY case is trivial and returns coordinates in [0-1> range only.
  2. The week is set up to honor what the start of the week is (Sunday or Monday usually) and them returns in the [0-7> range
  3. Year uses Julian day plus clock divided by length of that year to give coordintaes in [0-1> range.
  4. The month currently is a different flavor of annual where the month number is honored and coordinates are [0-12>. Here, data from February are guaranteed to have coordinates in the range [1-2> and a histogram using a width of 1 will yield separate bins for January and February directly. Compare that to YEAR where you cannot easily use the coordinates to separate individual months.

So in this scheme there is no month-period option that reduces all data to a single uniform month, and per @WalterHFSmith comment I do not think this as a useful thing. Anyone with a strange case can always hack something via gmt math and MOD.

Let me know if you have comments or corrections to the code snippet - it should be readable as is.

PaulWessel commented 3 years ago

I guess in the context of those four cases, the mythical generic month would be similar to the PERIODIC_MONTH except we would not all cal.month -1 and hence just get [0-1>.

WalterHFSmith commented 3 years ago

Paul,

This is all fine with me.

I do not mean to say this needs any more discussion. I just find the terms confusing.

To me a period is a duration measured in some physical units (or distance along a number line). If we create y = (time modulo period); then y is still in the same physical units. If we define x = y/period; then x is dimensionless and runs from [0,1).

But your “periodic week” is periodic in a week and has units of days. And your periodic month has “units” of month indices but a period of a year.

What we are talking about doesn’t obey the math of a periodic function in the math sense. For example, if d is in floating point days since the start of some year, and period is the number of days in the year, then if we set u = d/period; x = u - floor(u); q = x * 12; then the interval 1 <= q < 2 is not equivalent to “February”.

So it seems to me that belonging, or not, to “February”, is a category, like being or not being “schist” or “Norwegian”. What I think you are doing is mapping intervals of the time line onto categories.

I understand what you are trying to do is a pseudo-periodic mapping of data for the purposes of making histograms with bins of months of the year. But I just find the words for it confusing.

Just my problem. Don’t change anything.

W

On Jan 27, 2021, at 4:29 PM, Paul Wessel notifications@github.com wrote:

Period may be a bad word here since making histograms of stuff per calendar month is using a variable period of 28-31 days in order to bin the data. Perhaps it is useful to post the current algorithm as I have it right now:

void gmtlib_modulo_time_calculator (struct GMT_CTRL GMT, double val) { /* Only called if any of the -fy|m|w|d statements were given to yield periodic data.

  • val holds time in seconds since the epoch (e..g, UNIX time since 1970). / int L; struct GMT_GCAL cal; switch (GMT->current.io.cycle_time_operator) { case GMT_PERIODIC_DAY: / Return 0.000-0.999999 day / val = fmod (val, GMT_DAY2SEC_F) GMT_SEC2DAY; / Yields 0.000-0.999999 day / break; case GMT_PERIODIC_WEEK: / Return 0.00000-6.9999999 days / gmt_gcal_from_dt (GMT, val, &cal); val = (GMT->current.setting.time_week_start) ? (7 + cal.day_w - GMT->current.setting.time_week_start) % 7 : cal.day_w; val += cal.hour GMT_HR2DAY + cal.min GMT_MIN2DAY + cal.sec GMT_SEC2DAY; break; case GMT_PERIODIC_MONTH: / Return 0.000000-11.999999 months / gmt_gcal_from_dt (GMT, val, &cal); L = gmtlib_gmonth_length (cal.year, cal.month); / Days in this month / val = cal.month - 1 + (cal.day_m - 1 + cal.hour GMT_HR2DAY + cal.min GMT_MIN2DAY + cal.sec GMT_SEC2DAY) / L; break; case GMT_PERIODIC_YEAR: / Return 0.00000-0.99999999 years / gmt_gcal_from_dt (GMT, val, &cal); L = gmtlib_is_gleap (cal.year) ? 366 : 365; / Length of this year in days / val = (cal.day_y - 1 + cal.hour GMT_HR2DAY + cal.min GMT_MIN2DAY + cal.sec * GMT_SEC2DAY) / L; break; } }

    • As you can see, with no leap seconds, the DAY case is trivial and returns coordinates in [0-1> range only. • The week is set up to honor what the start of the week is (Sunday or Monday usually) and them returns in the [0-7> range • Year uses Julian day plus clock divided by length of that year to give coordintaes in [0-1> range. • The month currently is a different flavor of annual where the month number is honored and coordinates are [0-12>. Here, data from February are guaranteed to have coordinates in the range [1-2> and a histogram using a width of 1 will yield separate bins for January and February directly. Compare that to YEAR where you cannot easily use the coordinates to separate individual months. So in this scheme there is no month-period option that reduces all data to a single uniform month, and per @WalterHFSmith comment I do not think this as a useful thing. Anyone with a strange case can always hack something via gmt math and MOD.

Let me know if you have comments or corrections to the code snippet - it should be readable as is.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

PaulWessel commented 3 years ago

Yes, well put Walter. Presumably the description of this in the man page for -f will be broad enough, and link to examples, that it will be crystal clear. Maybe we can avoid the term period and periodicity altogether here. BTW, I have met Norwegian geologists with T-shirt saying "schist happen".

WalterHFSmith commented 3 years ago

Perhaps in Norway they do happen; I haven’t studied the geology of Norway. ;)

On Jan 27, 2021, at 5:09 PM, Paul Wessel notifications@github.com wrote:

Yes, well put Walter. Presumably the description of this in the man page for -f will be broad enough, and link to examples, that it will be crystal clear. Maybe we can avoid the term period and periodicity altogether here. BTW, I have met Norwegian geologists with T-shirt saying "schists happen".

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

PaulWessel commented 3 years ago

I am trying not to go full circle back to #4507 on this, but we are the generic mapping tools people, so I struggle a bit not to include the generic solution which is the user-supplied period [and possibly optional phase start]. An alternative to -f would be to use a new common GMT option that would look like this, perhaps -w for wrapping:

-w[col]y|m|w|d|pperiod[/phase]

where col selects which column this applies to [Default is 0, i.e., x] and the basic temporal cases we have discussed so far are preset via the character codes while the generic case requires a period and possibly the phase [0]. In the generic case we would then be back to

coord = (x - phase) % period + phase

which would complement the time-specific stuff above. This all will work just fine for the x-coordinate but would need some TLC for periodic y-columns due to wrapping, vertical lines. That is it. The final thing to add would be some specific help with the auto-annotations for the months and weekdays. Having a separate common option for this makes sense since it is available to all modules plus avoids overloading existing options with modifiers than complicate the options from the externals.

WalterHFSmith commented 3 years ago

I think you don’t want to add the phase back in to coord below.

And would you also want the common GMT option to also be used to specify periodic boundary conditions for splines, interpolation, etc? That is another case where we might potentially impose a “period”.

w

On Jan 27, 2021, at 7:58 PM, Paul Wessel notifications@github.com wrote:

I am trying not to go full circle back to #4507 on this, but we are the generic mapping tools people, so I struggle a bit not to include the generic solution which is the user-supplied period [and possibly optional phase start]. An alternative to -f would be to use a new common GMT option that would look like this, perhaps -w for wrapping:

-w[col]y|m|w|d|pperiod[/phase]

where col selects which column this applies to [Default is 0, i.e., x] and the basic temporal cases we have discussed so far are preset via the character codes while the generic case requires a period and possibly the phase [0]. In the generic case we would then be back to

coord = (x - phase) % period + phase

which would complement the time-specific stuff above. This all will work just fine for the x-coordinate but would need some TLC for periodic y-columns due wrapping lines. That is it. The final thing to add would be some specific help with the auto-annotations for the months and weekdays. Having a separate common option for this makes sense since it is available to all modules plus avoids overloading existing options with modifiers than complicates options from the externals.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

PaulWessel commented 3 years ago

Right, no adding back. There is a separate and unrelated wrapping that may happen due to -R under the hood (like for longitdue).

We have -n for BCs for 2-D interpolation, and for 1-D interpolation (sample1d) we offer up different splines with different conditions. We gave currently no way of setting BCs on curvatures (other than 0) or slopes at the end of the splines but that is very specific to sample1d though.

PaulWessel commented 3 years ago

Pinging @GenericMappingTools/core on this proposal.

Pro:

  1. Simple for users: Add -wm and you are turning your times into month.xxx for true monthly histograms, for instance
  2. Generic: Add -w1p17/4 to analyze that odd data with a period of 17 starting at phase 4, on the y-axis, via ycoord = (y - phase) % period
  3. Global: Works across all modules reading tables
  4. Simple for developers: Avoids complicating the -f option with modifiers so easier from externals
  5. Already works with wrapping of lines across periodic boundary for x (piggy-backs of longitude-wrapping)

Con:

  1. Adds a new common option instead of a modifier to an existing one (-f)
  2. ???
WalterHFSmith commented 3 years ago

I like it.

On Feb 2, 2021, at 1:41 PM, Paul Wessel notifications@github.com wrote:

Pinging @GenericMappingTools/core on this proposal.

Pro:

• Simple for users: Add -wm and you are turning your times into month.xxx for true monthly histograms, for instance • Generic: Add -w1****p17/4 to analyze that odd data with a period of 17 starting at phase 4, on the y-axis, via ycoord = (y - phase) % period • Global: Works across all modules reading tables • Simple for developers: Avoids complicating the -f option with modifiers so easier from externals • Already works with wrapping of lines across periodic boundary for x (piggy-backs of longitude-wrapping) Con:

• Adds a new common option instead of a modifier to an existing one (-f) • ??? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

joa-quim commented 3 years ago

Yes, me too. Much better than overcharge another, already hard to grok, variable.

PaulWessel commented 3 years ago

Love the subtleties of the english language:

It is often in the combinations of things clarity arises.

joa-quim commented 3 years ago

Ah, didn't know grok had actually entered in the English dictionary. It was invented by Robert Henlein in his famous book Strange in a strange land

joa-quim commented 3 years ago

And

PaulWessel commented 3 years ago

See the -w description in the book book that I am working on (below). There are two remaining issues:

As is, we have two slightly different yearly cycles (one was earlier mislabeled as "monthy").

  1. Option -wy selects the yearly normalized cycle: t1 = time_since_start_of_year / length_of_year.
  2. Option -wa selects the annual cycle where each month is a category and the category is preserved in the coordinate: t2 = month_number + time_since_start_of_month / length_of_month.

You can probably see that t1*12 is not equal to t2 due to the binning into months. Thus, I suggest we keep these as two separate options. if you dont like annual vs yearly please give options. We need to avoid "month" though.

As for annotations and ticks: The annual (-wa) cycle should involve labeling of months since that is what the coordinates represents. The weekly (-ww) axis should likewise label the 0-6 using the days of the week, with the normal machinery of language and abbreviations coming into play.

Let me know if you have comments or are OK with this plan.

w

WalterHFSmith commented 3 years ago

This is great!

I just wish the plot of average monthly discharge could annotate the x axis in month names plotted in the intervals. maybe that can be illustrated in a separate call to psbasemap, after first illustrating the concept with the plot the way it is?

w

On Feb 3, 2021, at 4:59 PM, Paul Wessel notifications@github.com wrote:

See the -w description in the book book that I am working on (below). There are two remaining issues:

• The naming and use of the two different types of annual/yearly cycles • The labeling of axes for annual and weekly cycles As is, we have two slightly different yearly cycles (one was earlier mislabeled as "monthy").

• Option -wy selects the yearly normalized cycle: t1 = time_since_start_of_year / length_of_year. • Option -wa selects the annual cycle where each month is a category and the category is preserved in the coordinate: t2 = month_number + time_since_start_of_month / length_of_month. You can probably see that t1*12 is not equal to t2 due to the binning into months. Thus, I suggest we keep these as two separate options. if you dont like annual vs yearly please give options. We need to avoid "month" though.

As for annotations and ticks: The annual (-wa) cycle should involve labeling of months since that is what the coordinates represents. The weekly (-ww) axis should likewise label the 0-6 using the days of the week, with the normal machinery of language and abbreviations coming into play.

Let me know if you have comments or are OK with this plan.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

PaulWessel commented 3 years ago

Yes, annotating in month names is the plan, per my points above. Same for a week with weekdays. @meghanrjones will get me some covid-19 data to play with for a week cycle.

maxrjones commented 3 years ago

All done here?

PaulWessel commented 3 years ago

yes, closing now.