Open jangorecki opened 6 years ago
Proposed rollmean
implementation, simplified.
x = data.table(v1=1:5, v2=1:5)
k = c(2, 3)
i - single column
j - single window
m - int referring to single row
w - current row's sum of rolling window
r - answer for each i, j
for i in x
for j in k
r = NA_real_
w = 0
for m in 1:length(i)
w = w + i[m]
w = w - i[m-j]
r[m] = w / j
yes, and many more rolled functions follow the same basic idea (including rolling standard deviation/any expectation-based moment, and any function like rollproduct that uses invertible * instead of + to aggregate within the window
I always envisioned rolling window functionality as grouping the dataset into multiple overlapping groups (windows). Then the API would look something like this:
DT[i, j,
by = roll(width=5, align="center")]
Then if j
contains, say, mean(A)
, we can internally replace it with rollmean(A)
-- exactly like we are doing with gmean()
right now. Or j
can contain an arbitrarily complicated functionality (say, run a regression for each window), in which case we'd supply .SD
data.table to it -- exactly like we do with groups right now.
This way there's no need to introduce 10+ new functions, just one. And it feels data.table-y in spirit too.
yes, agree
On Sat, Apr 21, 2018, 3:38 PM Pasha Stetsenko notifications@github.com wrote:
I always envisioned rolling window functionality as grouping the dataset into multiple overlapping groups (windows). Then the API would look something like this:
DT[i, j, by = roll(width=5, align="center")]
Then if j contains, say, mean(A), we can internally replace it with rollmean(A) -- exactly like we are doing with gmean() right now. Or j can contain an arbitrarily complicated functionality (say, run a regression for each window), in which case we'd supply .SD data.table to it -- exactly like we do with groups right now.
This way there's no need to introduce 10+ new functions, just one. And it feels data.table-y in spirit too.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Rdatatable/data.table/issues/2778#issuecomment-383275134, or mute the thread https://github.com/notifications/unsubscribe-auth/AHQQdbADiE4aAI1qPxPnFXUM5gR-0w2Tks5tquH8gaJpZM4TeTQf .
@st-pasha interesting idea, looks like data.table-y spirit, but it will impose many limitations, and isn't really appropriate for this category of functions.
DT[, rollmean(V1, 3), by=V2]
DT[, .(rollmean(V1, 3), rollmean(V2, 100))]
[.data.table
as we now allow for shift
rollmean(rnorm(10), 3)
DT[, .(rollmean(list(V1, V2), c(5, 20)), rollmean(list(V2, V3), c(10, 30)))]
mean
and rollmean
in same j
call
DT[, .(rollmean(V1, 3), mean(V1)), by=V2]
Usually when using by
we aggregate data to smaller number of rows, while rolling functions always returns vector of same length as input. This types of functions in SQL have API in the following style:
SELECT AVG(value) OVER (ROWS BETWEEN 99 PRECEDING AND CURRENT ROW)
FROM tablename;
You can still combine it with GROUP BY as follows:
SELECT AVG(value) OVER (ROWS BETWEEN 99 PRECEDING AND CURRENT ROW)
FROM tablename
GROUP BY group_columns;
So in SQL those functions stays in SELECT
which refers to j
in DT.
In DT we could achieve the same with:
DT[, rollmean(value, 100)]
DT[, rollmean(value, 100), group_columns]
Rolling functions fits into same category of functions as shift
which also returns same number of rows as it got on input.
Shift in SQL looks like:
SELECT LAG(value, 1) OVER ()
FROM tablename;
mean
and rollmean
are not just different functions, they are different categories of functions. One meant to aggregate according to group, another not aggregate at all. This is easily visible in SQL where we don't use GROUP BY
for rolling functions type but we do need to use GROUP BY
for aggregates like mean
(eventually getting grant total when grouping clause is not present).
I don't see strong reasoning to try to apply same optimizations rules as we do for mean
, especially when it doesn't really fit to use case, and all that just for the sake of data.table-y spirit. Current proposal is data.table-y spirit too, it can easily combined with :=
, same as shift
. It just adds set of new functions, currently not available in data.table.
@jangorecki Thanks, these are all valid considerations. Of course different people have different experiences, and different views as to what should be considered "natural".
It is possible to perform rollmean by group: this is just a 2-level grouping: DT[, mean(V1), by=.(V2, roll(3))]
. However I don't see how to make different window sizes on different columns with my syntax...
I must admit I never seen SQL syntax for rolling joins before. It's interesting that they use standard aggregator such as AVG
yet apply the windowing specification to it. Looking at the Transact-SQL documentation there are some interesting ideas there, for example the distinction between logical/physical row selection. They do allow different "OVER" operators on different columns, however in all examples they give, it is the same OVER clause repeated multiple times. So it suggests that this use-case is much more common, and hence using a single roll()
group would result in less repetition.
Also, this SO question provides an interesting insight why the OVER syntax was introduced in SQL at all:
You can use GROUP BY SalesOrderID. The difference is, with GROUP BY you can only have the aggregated values for the columns that are not included in GROUP BY. In contrast, using windowed aggregate functions instead of GROUP BY, you can retrieve both aggregated and non-aggregated values. That is, although you are not doing that in your example query, you could retrieve both individual OrderQty values and their sums, counts, averages etc. over groups of same SalesOrderIDs.
So it appears that the syntax is designed to circumvent the limitation of standard SQL where group-by results could not be combined with unaggregated values (i.e. selecting both A
and mean(A)
in the same expression). However data.table
does not have such a limitation, so it has more freedom in its choice of syntax.
Now, if we want to really get ahead of the curve, we need to think in a broader perspective: what are the "rolling" functions, what are they used for, how they can be extended, etc. Here's my take this, coming from a statistician's point-of-view:
"Rolling mean" function is used to smooth some noisy input. Say, if you have observations over time and you want to have some notion of "average quantity", which would nevertheless vary over time although very slowly. In this case "rolling mean over last 100 observations" or "rolling mean over all previous observations" can be considered. Similarly, if you observe certain quantity over a range of inputs, you may smooth it out by applying "rolling mean over ±50 observations".
All of these can be implemented as extended grouping operators, with rolling windows being just one of the elements on this list. That being said, I don't why we can't have it both ways.
I must admit I never seen SQL syntax for rolling joins before.
I assume you mean rolling functions, issue has nothing to do with rolling joins.
They do allow different "OVER" operators on different columns, however in all examples they give, it is the same OVER clause repeated multiple times. So it suggests that this use-case is much more common, and hence using a single roll() group would result in less repetition.
It is just a matter of use case, if you are calling same OVER() many time you may find it more performant to use GROUP BY
, build lookup table and re-use in other queries. Whatever examples are there, repeating OVER() is required to retain the feature of locality for each measure provided. My uses cases from Data Warehouses where not as simple as those from Microsoft docs.
In contrast, using windowed aggregate functions instead of GROUP BY, you can retrieve both aggregated and non-aggregated values.
In data.table we do :=
and by
in one query to achieve it.
So it appears that the syntax is designed to circumvent the limitation of standard SQL where group-by results could not be combined with unaggregated values (i.e. selecting both A and mean(A) in the same expression). However data.table does not have such a limitation, so it has more freedom in its choice of syntax.
It isn't much limitation of SQL but just design of GROUP BY, that it will aggregate, the same way that our by
aggregates. New API was required to cover new window functionalities. Grouping for SQL window function can be provided for each function call using FUN() OVER (PARTITION BY ...)
where partition by is like local grouping for single measure. So to achieve flexibility of SQL we would need to use j = mean(V1, roll=5)
or j = over(mean(V1), roll=5)
keeping that API in j
. Still this approach will not allow to support all use cases mentioned above.
you may smooth it out by applying "rolling mean over ±50 observations".
This is what align
argument is used for.
So, the first extension is to look at "smooth windows": imagine a mean over past observations where the further an observation in the past, the less its contribution is. Or an average of nearby observations over a Gaussian kernel.
There are many variants (virtually unlimited number) of moving averages, the most common smoothing window function (other than rollmean/SMA) is exponential moving average (EMA). Which should be included, and which not, is not trivial to decide, and actually best to make that decision according to feature requests that will come from users, so far none like this was requested.
All of these can be implemented as extended grouping operators, with rolling windows being just one of the elements on this list.
Surely they can, but if you will look at SO, and issues created in our repo, you will see that those few rolling functions here are responsible for 95+% of requests from users. I am happy to work on EMA and other MAs (although I am not sure if data.table is best place for those), but as a separate issue. Some users, me included, are waiting for just simple moving average in data.table for 4 years already.
Here's my take this, coming from a statistician's point-of-view
My point-of-view comes from Data Warehousing (where I used window function, at least once a week) and price trend analysis (where I used tens of different moving averages).
rollmean
draft is pushed to roll
branch. I found most of other packages that implements rolling mean are not able to dealt well with na.rm=FALSE
and NAs present in input. This implementation handles NA consistently to mean
, which impose some extra overhead because of ISNAN
calls. We could allow API to faster but less safe version if user is sure there are no NAs in input.
PR is in #2795
@mattdowle answering questions from PR
Why are we doing this inside data.table? Why are we integrating it instead of contributing to existing packages and using them from data.table?
my guess is it comes down to syntax (features only possible or convenient if built into data.table; e.g. inside [...] and optimized) and building data.table internals into the rolling function at C level; e.g. froll* should be aware and use data.table indices and key. If so, more specifics on that are needed; e.g. a simple short example.
For me personally it is about speed and lack of chain of dependencies, nowadays not easy to achieve. Key/indices could be useful for frollmin/frollmax, but it is unlikely that user will create index on measure variable. It is unlikely that user will make index on measure variable, also we haven't made this optimization for min/max yet. I don't see much sense for GForce optimization because allocated memory is not released after roll* call but returned as answer (as opposed to non-rolling mean, sum, etc.).
If there is no convincing argument for integrating, then we should contribute to the other packages instead.
I listed some above, if you are not convinced I recommend you to fill a question to data.table users, ask on twitter, etc. to check response. This feature was long time requested and by many users. If response won't convince you then you can close this issue.
I found sparklyr
can support rolling functions very well in a very large scale dataset.
@harryprince could put a little bit more light by providing example code of how you do it in sparklyr? According to "Window functions" dplyr vignette
Rolling aggregates operate in a fixed width window. You won’t find them in base R or in dplyr, but there are many implementations in other packages, such as RcppRoll.
AFAIU you use custom spark API via sparklyr for which dplyr interface is not implemented, correct?
This issue is about rolling aggregates, other "types" of window functions are already in data.table
for a long time.
Providing some example so we can compare (in-memory) performance vs sparklyr
/SparkR
would also be helpful.
It just occurred to me that this question:
how to calculate different window sizes for different columns?
has in fact a broader scope, and does not apply to rolling functions only.
For example, it seems to be perfectly reasonable to ask how to select the average product price by date, and then by week, and then maybe by week+category -- all within the same query. If we ever to implement such functionality, the natural syntax for it could be
DT[, .( mean(price, by=date),
mean(price, by=week),
mean(price, by=c(week, category)) )]
Now, if this functionality was already implemented, then it would have been a simple leap from there to rolling means:
DT[, .( mean(price, roll=5),
mean(price, roll=20),
mean(price, roll=100) )]
Not saying that this is unequivocally better than rollmean(price, 5)
-- just throwing in some alternatives to consider...
@st-pasha
how to select the average product price by date, and then by week, and then maybe by week+category -- all within the same query.
AFAIU this is already possible using ?groupingsets
, but not hooked into [.data.table
yet.
@jangorecki , @st-pasha , and Co. -- Thanks for all your work on this! I'm curious why partial window support was removed from the scope, is there any potential for that functionality to make it back on the roadmap? Would come in handy for me sometimes, and fill in a functionality gap that to my knowledge hasn't been filled in either zoo
or RcppRoll
.
This Stack Overflow Question is a good example of a rolling application that could benefit from a partial = TRUE
argument.
@msummersgill Thanks for feedback. In the first post I explicitly linked commit sha where partial window feature code can be found. The implementation that is there was later removed to reduce complexity of code. It was also imposing small performance cost even when not using that feature. This feature can (and probably should) be implemented the other way, first complete as is, and then just fill up the missing partial window using extra loop of 1:window_size
. So the overhead of that feature is only noticeable when you use it. Nevertheless we do provide that functionality via adaptive
argument, where partial
feature is just a special case of adaptive
rolling mean. There is example how to achieve partial
using adaptive
in ?froll
manual. Pasting it here:
d = as.data.table(list(1:6/2, 3:8/4))
an = function(n, len) c(seq.int(n), rep(n, len-n))
n = an(3, nrow(d))
frollmean(d, n, adaptive=TRUE)
Of course it will not be as efficient as non-adaptive rolling function using extra loop to fill up just partial window.
AFAIK zoo
has partial
feature.
Do you guys have any plan of adding rolling regression functions to data.table?
@waynelapierre if there will be a demand for that, then yes. You have my +1
thanks this is great. Just one question though. I only see simple rolling aggregates, like a rolling mean or rolling median. Are you also implementing more refined rolling functions such as rolling DT dataframes? Say, create a rolling DT using the last 10 obs and run a lm
regression on it.
Thanks!
@randomgambit I would say it is out of scope, unless there will be high demand for that. It wouldn't be very difficult to do it to be faster than base R/zoo just by handling nested loop in C. But we should try to implement it using "online" algorithm, to avoid nested loop. This is more tricky, and we could eventually do it for any statistic, so we have to cut off those statistics at some point.
@jangorecki interesting thanks. That means I will keep using tsibble
to embed... DATA.TABLES
in a tibble
! mind blown :D
Tried to use frollmean
to calculate a nonparametric "logistic curve" which shows P[y | x]
for binary y
using nearest neighbors of x
. Was surprised y
stored as logical
was not cast automatically to integer
:
DT = data.table(x = rnorm(1000), y = runif(1000) > .5)
DT[order(x), .(x, p_y = frollmean(y, 50L))]
Error in froll(fun = "mean", x = x, n = n, fill = fill, algo = algo, align = align, : x must be of type numeric
An example of how vectorized x
/n
arguments can impact performance.
https://github.com/AdrianAntico/RemixAutoML/commit/d8370712591323be01d0c66f34a70040e2867636#r34769837
less loops, code easier to read, much faster. Code using frollmean in a loop vs passing lists/vectors to frollmean, result 10x-36x speedup.
frollapply ready: https://github.com/Rdatatable/data.table/pull/3600
### fun mean sum median
# rollfun 8.815 5.151 60.175
# zoo::rollapply 34.373 27.837 88.552
# zoo::roll[fun] 0.215 0.185 NA
# frollapply 5.404 1.419 56.475
# froll[fun] 0.003 0.002 NA
hi guys, will FUN(user defined) passed to frollapply be changed to return an R object or data.frame(data.table), x passed to frollapply could be data.table of character not coerced to numeric, then FUN could do on labels and frollapply return a list? then we can do rolling regression or rolling testing like doing Benford's testing or summary on labels.
It is always useful to provide reproducible example. To clarify... in such a scenario you would like to frollapply(dt, 3, FUN)
return a list of length nrow(dt)
where each list element will be data.table
returned by FUN(dt[window])
?
frollapply(x=dt, n=3, fun=FUN)[[3]]
equals to FUN(dt[1:3])
frollapply(x=dt, n=3, FUN=FUN)[[4]]
equals to FUN(dt[2:4])
is that correct? @jerryfuyu0104
Currently we support multiple columns passed to first argument but we process them separately, looping. We would probably need some extra argument multi.var=FALSE
, when set to true it would not loop over x
(as it does now: list(FUN(x[[1]]),FUN(x[[2]]))
) but pass all columns FUN(x)
.
any update for this?
I second that previous request.
Furthermore, would it be possible to support a "partial" argument to allow for partial windows?
@eliocamp could you elaborate on what a partial
window is?
@eliocamp it would be possible to support "partial" argument. You may know that already but support for this functionality is already there, using adaptive=TRUE
argument, see examples for details.
It would mean computing the function from the beginning through the end instead than form the half-window point. For example for a rolling mean of 11 width, the first element returned would be the mean of the elements 1 through 6. The second, the mean of the 1st through 7th, and so on.
@jangorecki oh, thanks, I didn't know that! I'll check it out.
Agree, we need partial argument, not just for convenient but also for speed. adaptive=TRUE
adds an overhead.
And yes we also need rolling regression, so supplying multiple variables and rolling on them at once, not each one separately.
There is no update on the status of those.
I'd love to help but my C++ skills are utterly non-existent. :sweat: Do you think it might be suitable for complete newbies?
We don't code in C++ but in C. Yes it is good place to start with. I did exactly that on frollmean.
I look at the code and it seems daunting. But I'll update you in any case.
But now, for yet another request: frollmean(.SD) should preserve names. More generally, froll* should preserve names if the input is a list-like with names.
As a frequent user of data.table, I find it extremely useful to have "time aware" features, as those currently offered in the package tsibble
. Unfortunately this package is developed around dplyr
. I wonder if a data.table implementation could be possible. The window functions proposed in this issue are a subset of those features.
@ywhcuhk Thanks for feedback, I was actually thinking this issue was already asking for too much. Most of that is well covered by still lightweight package roll which is very fast. As for the other features, I suggest to create new issue for each feature you are interested in, so discussion whether we want to implement/maintain can be decided for each separately. Just from looking at readme of tstibble I don't see anything new it offers... Its title is "Tidy Temporal Data Frames" but it doesn't even seem to offer temporal joins.
Thank you @jangorecki for the response. Maybe it's a context dependent issue. The data structure I deal with most frequently is known as "panel data", with an ID and time. If the program is "aware" of this data feature, a lot operations, especially time-series operations, will be made very easy. For someone who knows STATA, it's the operations based on tsset
and xtset
, such as lead, lag, fill gap, etc. I think the "index" in the data.table can be enhanced in someway to enable such operations.
Of course, these operations can be done in data.table functions like shift
and by
. I just thought the index
in data.table has a lot potential to be explored. I agree this should belong to a different issue. But I don't know how to move it without loosing above discussions ...
@jangorecki @st-pasha
Hey guys, I'm bringing up a possible feature request. For ML and Forecasting, I use the frollmean and shift functions quite a bit to generate useful features. In a scoring environment I typically only need to generate those rolling stat features for a handful of records from the data.table. I already created some functions for recreating rolling stats on subsets of a data.table using a bunch of lags and rowmean's from outside the data.table package. However, I began testing if I could generate them in faster time using shift and frollmean with a subset in i. When testing it out I realized that I have to include all the rows that need to be used to create the lags and rolling means in order to use the subset in i properly and I'm not sure if that is the intended way to do so.
I have a few examples below where I try to create a lag column and a 2-period moving average for a single record in the data.table. In the examples, I first use the subset in i how I would like to use it, and then show that if I include the other rows used in the lag and rolling mean calc that I get what I want. It would more ideal for me if I only had to specify the rows I want the lags and rolling stats for without having to include the other rows in i.
@st-pasha I included you in this because I know you have frollmean on the roadmap for the python version and you haven't gotten to it yet.
################################################################################
# Create fake data
################################################################################
N = 25116
data <- data.table::data.table(
DateTime = as.Date(Sys.time()),
Target = stats::filter(
rnorm(N, mean = 50, sd = 20),
filter=rep(1,10),
circular=TRUE))
data[, temp := seq(1:N)][, DateTime := DateTime - temp]
data <- data[order(DateTime)]
DateTime Target temp
1: 1952-11-20 511.1355 25116
2: 1952-11-21 497.5900 25115
3: 1952-11-22 467.2040 25114
4: 1952-11-23 446.4739 25113
5: 1952-11-24 436.8124 25112
---
25112: 2021-08-21 631.6011 5
25113: 2021-08-22 598.5684 4
25114: 2021-08-23 570.2574 3
25115: 2021-08-24 561.8330 2
25116: 2021-08-25 527.9720 1
################################################################################
# Goal: Generate a 1-period lag for a single record in a data.table (temp == 1)
################################################################################
# Shift with i
data[temp %in% c(1), newval := data.table::shift(x = .SD, n = 1, fill = NA, type = 'lag'), .SDcols = "Target"]
DateTime Target temp newval
1: 1952-11-20 511.1355 25116 NA
2: 1952-11-21 497.5900 25115 NA
3: 1952-11-22 467.2040 25114 NA
4: 1952-11-23 446.4739 25113 NA
5: 1952-11-24 436.8124 25112 NA
---
25112: 2021-08-21 631.6011 5 NA
25113: 2021-08-22 598.5684 4 NA
25114: 2021-08-23 570.2574 3 NA
25115: 2021-08-24 561.8330 2 NA
25116: 2021-08-25 527.9720 1 NA
data[temp %in% c(1,2), newval := data.table::shift(x = .SD, n = 1, fill = NA, type = 'lag'), .SDcols = "Target"]
DateTime Target temp newval
1: 1952-11-20 511.1355 25116 NA
2: 1952-11-21 497.5900 25115 NA
3: 1952-11-22 467.2040 25114 NA
4: 1952-11-23 446.4739 25113 NA
5: 1952-11-24 436.8124 25112 NA
---
25112: 2021-08-21 631.6011 5 NA
25113: 2021-08-22 598.5684 4 NA
25114: 2021-08-23 570.2574 3 NA
25115: 2021-08-24 561.8330 2 NA
25116: 2021-08-25 527.9720 1 561.833
################################################################################
# Goal: Generate a 2-period moving average for a single record in a data.table (temp == 1)
################################################################################
# Create fake data
N = 25116
data <- data.table::data.table(
DateTime = as.Date(Sys.time()),
Target = stats::filter(
rnorm(N, mean = 50, sd = 20),
filter=rep(1,10),
circular=TRUE))
data[, temp := seq(1:N)][, DateTime := DateTime - temp]
data <- data[order(DateTime)]
# frollmean with i
data[temp %in% c(1), newval := data.table::frollmean(x = .SD, n = 2), .SDcols = "Target"]
DateTime Target temp newval
1: 1952-11-20 524.4159 25116 NA
2: 1952-11-21 497.6071 25115 NA
3: 1952-11-22 527.2184 25114 NA
4: 1952-11-23 486.7455 25113 NA
5: 1952-11-24 488.6396 25112 NA
---
25112: 2021-08-21 474.2944 5 NA
25113: 2021-08-22 511.5723 4 NA
25114: 2021-08-23 535.1824 3 NA
25115: 2021-08-24 536.3908 2 NA
25116: 2021-08-25 536.3070 1 NA
data[temp %in% c(1,2), newval := data.table::frollmean(x = .SD, n = 2), .SDcols = "Target"]
DateTime Target temp newval
1: 1952-11-20 524.4159 25116 NA
2: 1952-11-21 497.6071 25115 NA
3: 1952-11-22 527.2184 25114 NA
4: 1952-11-23 486.7455 25113 NA
5: 1952-11-24 488.6396 25112 NA
---
25112: 2021-08-21 474.2944 5 NA
25113: 2021-08-22 511.5723 4 NA
25114: 2021-08-23 535.1824 3 NA
25115: 2021-08-24 536.3908 2 NA
25116: 2021-08-25 536.3070 1 536.3489
@ywhcuhk if you are still interested in a feature you were asking for, please do provide minimal example (in a new issue in this repo). To be fair I still don't know what feature are you precisely requesting, maybe #3241 ? I am tidying up this thread by moving requests into the first post and don't know how to handle your.
rollcor rollcov rollrank rollunqn rolllm
went out of scope as of current moment. All can work using frollapply (not master branch but PRs), just not super fast. We could consider adding them to scope in future. For the current moment the following set of sum mean prod min max sd var median feels fine and complete to me.
@jangorecki just following up here based on your comment in {roll}. I was happy to see that frollmedian
and friends will be available in {data.table}! What is the status on frollmedian
- do you have a rough ETA? I can see that the PR has not been worked in since January and currently fails checks.
No ETA (it requires multiple other branches to be merged first). I recommend to use rollmedian
branch directly. It was made on a very stable point in master (cascading through other rolling related branches). I know it is being used in production.
Sounds good, I'll try that. Which rolling functions are available on that branch? Just frollmedian
or also others? (I'm doing some benchmarking, so just want make sure I get as many of your implementations as possible) 😊
Others as well, rollmedian is the most recent branch of all rolling branches so includes the rest as well. There is also rewritten frollapply to apply any function, which is multi threaded and memory optimized.
@roaldarbol if you're keen, the blocker for merging existing PR is lack of reviewer+author bandwidth. We could go for someone to either:
label:froll
except the frollmaxN
splitsfrollmaxN
splits but ideally we have a chain of PRs like for cbindlist()
/mergelist()
that is easily digested by reviewer
To gather requirements in single place and refresh ~4 years old discussions creating this issue to cover rolling functions feature (also known as rolling aggregates, sliding window or moving average/moving aggregates).
rolling functions
features
give.names
argument, same asshift
has (#5441)by.column=FALSE
(issue #4887, PR #5575)