IDEMSInternational / R-Instat

A statistics software package powered by R
http://r-instat.org/
GNU General Public License v3.0
38 stars 103 forks source link

Bug in Start of Rains - First Year getting "NA" #9130

Closed lilyclements closed 2 weeks ago

lilyclements commented 2 months ago

@rdstern I wanted to check something on the start of rains dialog which I think is a bug:

(1) On the start of rains dialog, the first year is always giving NA (even if there is a start)

We are running a rolling sum of rainfall from the right. So, if am doing a rolling sum of n = 2 values, I have

9, 10, 3, 2

This becomes

NA, 19, 13, 5

This is okay except for that NA. Because, we then say that start rain is:

From our second argument there, this means that our first year will always take that NA even if there is a start day

Is this what we want or expect? If so, then I'll stop here. But, I don't think we want the first year to always be NA!

Question 1: Why are we saying that if the first rolling sum is NA, then the start of rains is NA? (As someone who wrote this code seven years ago, this could have just been a misunderstanding at the time on my part, or I could now be missing something major). It seems this is where both of our problems are coming from. Instead, don't we just want to look at:

This is just removing a simple line from the filter in the R Code. (solution (a))

An alternative option (b) is: If we set fill to be a value (e.g., 0), then it will set the first n - 1 values as 0.

> x <- c(3, 4, NA, 1, 2, 3)
> RcppRoll::roll_sumr(x, n = 2, fill = NA, na.rm = FALSE)
# [1] NA  7 NA NA  3  5

> RcppRoll::roll_sumr(x, n = 2, fill = 0, na.rm = FALSE)
# [1]  0  7 NA NA  3  5

This helps, but, I would like to understand why we "care" about NAs in the rolling sum anyway

(2) If the last n days of the previous year is missing, then the next year has no start date (NA)

(e.g. if the rolling sum is n = 2, then, the 366th doy on the last year being NA, so the first day on the following year is NA since it's doing a rolling sum with an NA and a value, and so it gives NA)

This is fixed by solution (a) above (but not solution (b)). But, this also leads to a side-question to potentially fix/look into even if we do do solution (a) above:

If there is a missing value in the rainfall, the rolling sum set is NA. Do we want this? Presumably, if the rolling sum of the rainfall exceeds a threshold despite an NA in it, then we are happy to ignore the NA? Because, the NA isn't going to be negative rainfall -- if we are looking for the rolling sum of over 50mm of rain in 3 days, and we have

DAY 1: 1mm DAY 2: 48mm DAY 3: NA DAY 4: 20mm

Then currently that would give: NA, NA, NA, NA. But, on day 4 we have exceeded 50mm of rainfall so we don't care what the NA value is?

x <- c(1, 48, NA, 20)

RcppRoll::roll_sumr(x, n = 3, fill = NA, na.rm = FALSE)
# [1] NA NA NA NA

RcppRoll::roll_sumr(x, n = 3, fill = NA, na.rm = TRUE)
# [1] NA NA 49 68

So: Side Question: If there is a missing value in the rainfall, the rolling sum set is NA. Do we want this?

rdstern commented 2 months ago

@lilyclements I agree that the rolling sum is missing (for a given year/season) if it finds a missing rainfall value. But I hope the first year isn't always missing. It is missing in the first year if day 1 is also the earliest starting day. Then the rolling sum needs to start at the end of the previous year - which isn't there.

lilyclements commented 1 month ago

@rdstern makes sense - and you're right that its only missing if the starting day is day one. Is this what we want? If you're happy with how this is all working then I'll close the issue. Thanks!