IDEMSInternational / R-Instat

A statistics software package powered by R
http://r-instat.org/
GNU General Public License v3.0
38 stars 103 forks source link

Small (or large) change in the start and end of the rains dialogs #9117

Open rdstern opened 2 months ago

rdstern commented 2 months ago

Here is the current start of the rains dialog:

image

a) @lilyclements has solved the problem of the Occurrence control, at the bottom right. It would be wonderful if that can now be implemented? (It is on both the start and the end dialogs.) b) Meantime a trivial edit is to make the default to have the Date checkbox checked. So the default is to produce 2 variables. (When the Occurrence is done, then the default should be to have that checked as well. But start with the Date checkbox.

The reason is that the new Export to Google buckets exports the dates when they exist. So let's make it easy to produce them.

lilyclements commented 2 months ago

@rdstern this should be a small change. To clarify, once the .drop changes are merged in (PR #8794) all that needs to be done to achieve (a) is to add param_list = list(drop = FALSE) to the data_book$run_instat_calculation function.

That is,

data_book$run_instat_calculation(calc=start_of_rains_combined)

becomes

data_book$run_instat_calculation(calc=start_of_rains_combined, param_list = list(drop = FALSE))

Do we want this to always be the case to have drop = FALSE regardless of if occurrence is checked?

rdstern commented 2 months ago

@lilyclements that sounds great. @N-thony is that now a task you could do or could delegate.

I am not sure I fully understand your question on Drop = FALSE? If you include occurrence then it should now be TRUE if there is a start date, FALSE if there was no start date in the period and missing if there were missing values in the data. The day number and date will be missing for the 2 situations of FALSE and NA of occurrence. I suggest we need to keep it that way, because (though confusing) we will often have those years included anyway for other definitions calculated. So I think I am saying yes, we always have drop = FALSE!

lilyclements commented 2 months ago

@rdstern apologies for not being clear. I meant the parameter to drop. Do we always want to include the years even with no start date, regardless if occurrence is checked?

(drop = FALSE being the parameter option to not drop unused levels)

rdstern commented 2 months ago

@lilyclements good, I think I had understood and yes we keep all years, even those with no start date regardless. And one reason is that we may already have those years in the summary data frame, because of other events.

MeSophie commented 2 months ago

@lilyclements Sorry I added param_list = list(drop = FALSE) parameter in data_book$run_instat_calculation and I obtained this error using dodoma data.

image

# Dialog: Start of Rains

roll_sum_rain <- instat_calculation$new(type="calculation", function_exp="RcppRoll::roll_sumr(x=rain, n=2, fill=NA, na.rm=FALSE)", result_name="roll_sum_rain", calculated_from=list("dodoma"="rain"))
conditions_filter <- instat_calculation$new(type="filter", function_exp="((rain >= 0.85) & roll_sum_rain > 20) | is.na(x=rain) | is.na(x=roll_sum_rain)", sub_calculations=list(roll_sum_rain))
grouping_by_year <- instat_calculation$new(type="by", calculated_from=list("dodoma"="year"))
doy_filter <- instat_calculation$new(type="filter", function_exp="doy_366 >= 1 & doy_366 <= 366", calculated_from=calc_from_convert(x=list(dodoma="doy_366")))
start_of_rains_doy <- instat_calculation$new(type="summary", function_exp="ifelse(test=is.na(x=dplyr::first(x=rain)) | is.na(x=dplyr::first(x=roll_sum_rain)), yes=NA, no=dplyr::first(x=doy_366, default=NA))", result_name="start_rain", save=2)
start_of_rains_combined <- instat_calculation$new(type="combination", manipulations=list(conditions_filter, grouping_by_year, doy_filter), sub_calculation=list(start_of_rains_doy))
data_book$run_instat_calculation(calc=start_of_rains_combined, display=FALSE, param_list=list(drop=FALSE))
rm(list=c("start_of_rains_combined", "conditions_filter", "roll_sum_rain", "grouping_by_year", "doy_filter", "start_of_rains_doy"))
MeSophie commented 2 months ago

@lilyclements I Added the param_list = list(drop = TRUE) parameter in the code above and the code runs without error. Now my question is am I correct to set the default value to be true?

DataBook$set("public", "run_instat_calculation", function(calc, display = TRUE, param_list = list(drop = TRUE)) {
  out <- self$apply_instat_calculation(calc)
  if(display) return(out$data)
}
)
lilyclements commented 2 months ago

@MeSophie the error you're getting says "unused argument param_list". Have you merged in changes from PR #8794? They're needed for this change to work

MeSophie commented 1 month ago

@MeSophie the error you're getting says "unused argument param_list". Have you merged in changes from PR #8794? They're needed for this change to work

Okay thank you @lilyclements.

rdstern commented 1 month ago

@lilyclements and @MeSophie we also have the logical columns in the End of the rains/season and then in the length.

But I have now realied they are a bit different.

a) If there is no start of the rains (flag is FALSE), then we don't have a season at all. So I am happy with that. b) The end of the rains is the last occasion when there was 10mm on one day in the specified period, say FEbruary to April. We can easily make it that some year this doesn't occur, e.g. last time 50mm or more in 1 April to 30 April. But only if we are silly in the way we use the command? So I wonder about not including the logical for the end of the rains? Remember I am just thinking aloud here!) Or we do have the logical to confirm it never happens. So, if we get a FALSE for the end of the rains, then we need to examine the data carefully for those years. c) The end of the season usually starts from the date of the end of the rains. We need the logical for the end of the season. It is possible that we run off the end of the year, and the season has not finished. However - unlike the start - that means that the season is long and that's a good thing for the length. It means the end is later and the season is therefore long.

So, I am happy with the same calculation for the logical for the end of the season, as the start, namely TRUE, FALSE or NA. But if it is FALSE then let's make the day number and date the last day, and not missing?

Now what does this mean we want for the length of the season?

a) If either flag is NA then the length is NA.
b) If the start is FALSE and the end (season) is TRUE then the length flag is FALSE. The length value is missing, which is fine. c) If the start is TRUE and the end is FALSE then the length is not missing and the length is at least that value (in days). Should we make the Length flag still FALSE, so we need to use both variables? Or why not have the variable as categorical instead? It then has 4 options rather than 3, namely NA, or TRUE or NONE or MORE. Preferably better names for the levels?

lilyclements commented 1 month ago

@rdstern I agree, and I like what you're proposing a lot. Just to check:

End of Season If end_season_status is FALSE (i.e., we have data but nothing fits the criteria for the end of season), then set the end_season_doy and end_season_date as the last day. Keep end_season_status as FALSE.

End of Rains I think we should have a logical still to confirm it never happens - it might be useful for the user, but perhaps I am mistaken. No changes needed here?

Length of the Season:

Start Status End Status Season (Status)
NA [anything] NA (NA)
[anything] NA NA (NA)
F T NA (NONE)
T F 366 - start_rains (MORE)
T T end_rains - start_rains (TRUE)

@rdstern what if it is FALSE, FALSE for both start and end status?

If I've understood correctly, then I can make the appropriate changes in the R Code.

For "Length of the Season" (assuming the change to the end season is set to be max(doy) if the status is FALSE).

# Currently we run:
length_of_season <- instat_calculation$new(type="calculation", function_exp="end_rain - start_rain", result_name="length_value", calculated_from=list("dodoma_by_year"="start_rain","dodoma_by_year"="end_rain"), save=2)
start_end_status <- instat_calculation$new(type="calculation", function_exp="dplyr::if_else(is.na(start_status | end_status), NA, start_status & end_status)", result_name="length_status", calculated_from=list("dodoma_by_year"="start_status","dodoma_by_year"="end_status"), save=2)
length_rains_combined <- instat_calculation$new(type="combination", sub_calculation=list(length_of_season, start_end_status))
data_book$run_instat_calculation(calc=length_rains_combined, display=FALSE)
rm(list=c("length_rains_combined", "length_of_season", "start_end_status"))

# I suggest we change `start_end_status` to be:

start_end_status <- instat_calculation$new(type="calculation",
                                           function_exp="dplyr::case_when(
                                           is.na(start_status) | is.na(end_status) ~ NA_character_,
                                           start_status == end_status ~ as.character(start_status),
                                           start_status == FALSE & end_status == TRUE ~ 'NONE',
                                           start_status == TRUE & end_status == FALSE ~ 'MORE')",
                                           calculated_from=list("length_test_data"="start_status",
                                                                "length_test_data"="end_status"),
                                           result_name = "length_status",
                                           save=2)
rdstern commented 1 month ago

@lilyclements I agree with all that.

We also really should be writing this stuff up! There is a lot here! I suggest there could be a small series of papers as well as our climatic guide. Maybe we should discuss with the nice PICSA people too!!!

MeSophie commented 1 month ago

image

@lilyclements Could you please give more explanation about this part? Is there a code change to be made here? Thank you.

lilyclements commented 1 month ago

If end_season_status is FALSE (i.e., we have data but nothing fits the criteria for the end of season), then set the end_season_doy and end_season_date as the last day. Keep end_season_status as FALSE.

@rdstern to check - by "last day", do we mean the last day in the DOY variable (usually, 366, but whichever is the "largest" value in there, this would work for shifted doy's too presumably), or do we mean the last day in the filtered set of dates which the user is interested in? Thanks!

rdstern commented 1 month ago

@lilyclements and @MeSophie in the end of the rains and end of the season we have to give the starting and ending day for the calculations. It is that ending day that we give. So, when Status is FALSE the user knows the true ending day was later than that. We can't give 366, because it might have ended before then.

lilyclements commented 1 month ago

@rdstern I am getting confused. You mentioned above "if it is FALSE then let's make the day number and date the last day, and not missing?" Should we put here then the last day as the last day in the chosen day range?

rdstern commented 1 month ago

@lilyclements that's correct I think, For the start, it is missing. But for the end it is probably more useful to put the last day and date in the selected period. Then we can use it simply to give a length, and the presence of FALSE is enough to be able to say that the length is longer than that.

MeSophie commented 1 month ago

@lilyclements Please anything new about the End of rains dialog's code?

lilyclements commented 3 weeks ago

for the end it is probably more useful to put the last day and date in the selected period. Then we can use it simply to give a length, and the presence of FALSE is enough to be able to say that the length is longer than that.

Ok, so I'm going to just say what I've been trying.

1. To do with the day of year is very simple!

In this line, we were setting yes = NA. We now want to set yes = the value in ucrReceiverDOY (e.g., 51)

end_season <- instat_calculation$new(type="summary", function_exp="ifelse(test=is.na(x=dplyr::first(x=wb)), yes=51, no=dplyr::first(x=doy_366))", result_name="end_season", calculated_from=list("dodoma"="doy_366"), save=2)

2. For end_season_date is where it's more complex We haven't got this stored as a variable in our visual studio code, and it has been filtered out of our data set so doesn't exist by asking for the maximum of each year.

Option 1: One option is to "reconstruct" the date, and subbing that into here:

end_season_date <- instat_calculation$new(type="summary", function_exp="dplyr::if_else(condition=is.na(x=dplyr::first(x=wb)), true = RECONSTRUCTED BIT, false=dplyr::first(x=date))", result_name="end_season_date", save=2)

But then there's issues on how we get this reconstructed bit.

E.g., taking the first date and adding the last DOY to it:

as.Date(paste0(unique(year), '01', '01')) + 51 - 1

Issue here is that what is "Year"? What if there are shifted years?

E.g. 2, taking the finding the shifted year, adding that, and adding the last DOY to it:

as.Date(ISOdate(year, 1, 1)) + (data_book$get_variables_metadata("dodoma") %>% filter(Name == "s_doy") %>% pull(doy_start) + 51 - 1)

But then: What if they did doy_366 or doy_365? How can we fix that?

Option 2: Ideally instead of reconstructing, we go to the source. So: we don't want to filter out where wb <= X | is.na(rain) in the R code. We want to set that as a logical TRUE/FALSE variable, which we'll store as "wb_true". And then, we want to filter for each year where wb_true is TRUE, or if there are no instances of it that year, we want to filter to the final row of that year: if (any(wb_true)) wb_true == TRUE else row_number() == n() Then we just look for the first instance for our end DOY and end date of it. For our occurrence, we now just take the value of wb_true

In Tidyverse Format, we currently do this:

dat <- data_book$get_data_frame("dodoma")
dat %>%
  mutate(rain_min = ifelse(test=is.na(x=rain), yes=0, no=rain),
         wb_min = purrr::accumulate(.f= ~ pmin(pmax(..1 + ..2, 0), 1000), .x=tail(x=rain_min - 5, n=-1), .init=0),
         rain_max = ifelse(yes=100, test=is.na(x=rain), no=rain),
         wb_max = purrr::accumulate(.f= ~ pmin(pmax(..1 + ..2, 0), 1000), .x=tail(x=rain_max - 5, n=-1), .init=0),
         wb = ifelse(test=(wb_min != wb_max) | is.na(x=rain), yes=NA, no=wb_min)) %>%
  filter(doy_366 >= 41 & doy_366 <= 51) %>%
  group_by(year, .drop = FALSE) %>%
  filter(wb <= 1 | is.na(x=rain)) %>%
  summarise(end_season = ifelse(test=is.na(x=dplyr::first(x=wb)), yes=NA, no=dplyr::first(x=doy_366)),
            end_season_date = dplyr::if_else(condition=is.na(x=dplyr::first(x=wb)), true=as.Date(NA), false=dplyr::first(x=date)),
            occurrence = n() > 0)

We want to do this:

dat %>%
  mutate(rain_min = ifelse(test=is.na(x=rain), yes=0, no=rain),
         wb_min = purrr::accumulate(.f= ~ pmin(pmax(..1 + ..2, 0), 1000), .x=tail(x=rain_min - 5, n=-1), .init=0),
         rain_max = ifelse(yes=100, test=is.na(x=rain), no=rain),
         wb_max = purrr::accumulate(.f= ~ pmin(pmax(..1 + ..2, 0), 1000), .x=tail(x=rain_max - 5, n=-1), .init=0),
         wb = ifelse(test=(wb_min != wb_max) | is.na(x=rain), yes=NA, no=wb_min)) %>%
  mutate(x = (wb <= 1 | is.na(x=rain))) %>%
  filter(doy_366 >= 41 & doy_366 <= 51) %>%
  group_by(year, .drop = FALSE) %>%
  filter(
    case_when(n() > 0 & any(x) ~ x == TRUE,
              n() > 0 ~ row_number() == n(),
              TRUE ~ NA)
  ) %>%
  summarise(end_season = ifelse(test=is.na(x=dplyr::first(x=wb)), yes=NA, no=dplyr::first(x=doy_366)),
            end_season_date = dplyr::if_else(condition=is.na(x=dplyr::first(x=wb)), true=as.Date(NA), false=dplyr::first(x=date)),
            occurrence = dplyr::first(x))

So I've got it working in Tidyverse! But that won't work in our calculation system, and that's where I've been spending my day.

I can't understand why this is not translating into our calculation system, and I'm not sure who else knows the system well. @john-bagiliko I think you have done some translation work on Tidyverse to the Calculation system. Can you see where this is going wrong on translating that second tidyverse code above to this below.

I'm thinking it's that we filter out the dates already under doy_filter, but, I cannot reconstruct my R code above into the calculation system, and I cannot reconstruct my calculation system code below into Tidyverse!

rain_min <- instat_calculation$new(type="calculation", function_exp="ifelse(test=is.na(x=rain), yes=0, no=rain)", result_name="rain_min", calculated_from=list("dodoma"="rain"))
wb_min <- instat_calculation$new(type="calculation", function_exp="purrr::accumulate(.f= ~ pmin(pmax(..1 + ..2, 0), 1000), .x=tail(x=rain_min - 5, n=-1), .init=0)", result_name="wb_min", sub_calculations=list(rain_min))
rain_max <- instat_calculation$new(type="calculation", function_exp="ifelse(yes=100, test=is.na(x=rain), no=rain)", result_name="rain_max", calculated_from=list("dodoma"="rain"))
wb_max <- instat_calculation$new(type="calculation", function_exp="purrr::accumulate(.f= ~ pmin(pmax(..1 + ..2, 0), 1000), .x=tail(x=rain_max - 5, n=-1), .init=0)", result_name="wb_max", sub_calculations=list(rain_max))
wb <- instat_calculation$new(type="calculation", function_exp="ifelse(test=(wb_min != wb_max) | is.na(x=rain), yes=NA, no=wb_min)", result_name="wb", sub_calculations=list(wb_min, wb_max))
wb_filter_column <- instat_calculation$new(type="calculation", function_exp="(wb <= 1) | is.na(x=rain)", result_name="wb_filter_column", sub_calculations=list(wb), save = 2)
grouping_by_station_year <- instat_calculation$new(type="by", calculated_from=list("dodoma"="year"))
doy_filter <- instat_calculation$new(type="filter", function_exp="doy_366 >= 41 & doy_366 <= 51", calculated_from=calc_from_convert(x=list("dodoma"="doy_366")))
conditions_filter <- instat_calculation$new(type="filter", function_exp="dplyr::case_when(n() > 0 & any(wb_filter_column) ~ wb_filter_column == TRUE, n() > 0 ~ row_number() == n(), TRUE ~ NA)", sub_calculations=list(wb_filter_column))
end_season <- instat_calculation$new(type="summary", function_exp="ifelse(test=is.na(x=dplyr::first(x=wb)), yes=NA, no=dplyr::first(x=doy_366))", result_name="end_season", calculated_from=list("dodoma"="doy_366"), save=2)
end_season_date <- instat_calculation$new(type="summary", function_exp="dplyr::if_else(condition=is.na(x=dplyr::first(x=wb)), true=as.Date(NA), false=dplyr::first(x=date))", result_name="end_season_date", save=2)
end_season_status <- instat_calculation$new(type="summary", function_exp="dplyr::first(wb_filter_column)", result_name="end_season_status", save=2)
end_of_season_combined <- instat_calculation$new(type="combination", manipulations=list(conditions_filter, grouping_by_station_year, doy_filter), sub_calculations=list(end_season, end_season_date, end_season_status))
data_book$run_instat_calculation(calc=end_of_season_combined, display=FALSE, param_list = list(drop = FALSE))
lilyclements commented 1 week ago

After a discussion with @rdstern, we have agreed to have it just for the day of year column. This means that this is a really nice simple small change!

In this line, we were setting yes = NA. We now want to set yes = the value in ucrReceiverDOY (e.g., 51)

end_season <- instat_calculation$new(type="summary", function_exp="ifelse(test=is.na(x=dplyr::first(x=wb)), yes=51, no=dplyr::first(x=doy_366))", result_name="end_season", calculated_from=list("dodoma"="doy_366"), save=2)

@MeSophie does this make sense?

rdstern commented 1 week ago

@lilyclements and @MeSophie would the end of the season be easy if you gave it as NA when it was censored - as you do for the length. Then that would be good enough for now. We could add the extra column after the date in another pull request, and I suggest that would be easy to do as a follow-on calculation - or we could use that for the user.

lilyclements commented 1 week ago

@rdstern great.

@MeSophie I'm outlining here the changes to make. We want to do two things in this R code, all very straight forward:

1. Change end_season to now take the final doy given.

Here, we just set yes=LAST DOY instead of yes=NA for end_season:

end_season <- instat_calculation$new(type="summary", function_exp="ifelse(test=is.na(x=dplyr::first(x=wb)), yes=LAST DOY, no=dplyr::first(x=doy_366))", result_name="end_season", calculated_from=list("dodoma"="doy_366"), save=2)

2. For both end of rains and end of season we want to add in stuff to make year as a factor to have the .drop bits working.

We do the same here as we did in start of rains, that is:

a) Before our instat_calculation codes we run

year_type <- data_book$get_column_data_types(data_name="dodoma", columns="year")
data_book$convert_column_to_type(data_name="dodoma", col_names="year", to_type="factor")

b) We add in param_list=list(drop=FALSE) into our data_book$run_instat_calculation code:

data_book$run_instat_calculation(calc = end_of_rains_combined, display=FALSE, param_list=list(drop=FALSE))

c) Then after the instat_calculation codes, we add in stuff to convert year back again:

linked_data_name <- data_book$get_linked_to_data_name("dodoma", link_cols=c("year"))
# or year and station if we have station.

data_book$convert_column_to_type(data_name="dodoma", col_names="year", to_type=year_type)
data_book$convert_column_to_type(data_name=linked_data_name, col_names="year", to_type=year_type)

Example of this with full R Code for dodoma We can take the defaults for End of Season, but change to look at:

Then we want to get the following code:

# stuff at the start
year_type <- data_book$get_column_data_types(data_name="dodoma", columns="year")
data_book$convert_column_to_type(data_name="dodoma", col_names="year", to_type="factor")

# Dialog: End of Rains/Season
rain_min <- instat_calculation$new(type="calculation", function_exp="ifelse(test=is.na(x=rain), yes=0, no=rain)", result_name="rain_min", calculated_from=list("dodoma"="rain"))
wb_min <- instat_calculation$new(type="calculation", function_exp="purrr::accumulate(.f= ~ pmin(pmax(..1 + ..2, 0), 100), .x=tail(x=rain_min - 5, n=-1), .init=0)", result_name="wb_min", sub_calculations=list(rain_min))
rain_max <- instat_calculation$new(type="calculation", function_exp="ifelse(yes=100, test=is.na(x=rain), no=rain)", result_name="rain_max", calculated_from=list("dodoma"="rain"))
wb_max <- instat_calculation$new(type="calculation", function_exp="purrr::accumulate(.f= ~ pmin(pmax(..1 + ..2, 0), 100), .x=tail(x=rain_max - 5, n=-1), .init=0)", result_name="wb_max", sub_calculations=list(rain_max))
wb <- instat_calculation$new(type="calculation", function_exp="ifelse(test=(wb_min != wb_max) | is.na(x=rain), yes=NA, no=wb_min)", result_name="wb", sub_calculations=list(wb_min, wb_max))
conditions_filter <- instat_calculation$new(type="filter", function_exp="(wb <= 0.5) | is.na(x=rain)", sub_calculations=list(wb))
grouping_by_station_year <- instat_calculation$new(type="by", calculated_from=list("dodoma"="year"))
doy_filter <- instat_calculation$new(type="filter", function_exp="doy_366 >= 6 & doy_366 <= 10", calculated_from=calc_from_convert(x=list(dodoma="doy_366")))
end_season <- instat_calculation$new(type="summary", function_exp="ifelse(test=is.na(x=dplyr::first(x=wb)), yes=10, no=dplyr::first(x=doy_366))", result_name="end_season", calculated_from=list("dodoma"="doy_366"), save=2)
end_season_date <- instat_calculation$new(type="summary", function_exp="dplyr::if_else(condition=is.na(x=dplyr::first(x=wb)), true=as.Date(NA), false=dplyr::first(x=date))", result_name="end_season_date", save=2)
end_season_status <- instat_calculation$new(type="summary", function_exp="n() > 0", result_name="end_season_status", save=2)
end_of_season_combined <- instat_calculation$new(type="combination", manipulations=list(conditions_filter, grouping_by_station_year, doy_filter), sub_calculations=list(end_season, end_season_date, end_season_status))
data_book$run_instat_calculation(calc=end_of_season_combined, display=FALSE, param_list=list(drop=FALSE))

# stuff at the end
linked_data_name <- data_book$get_linked_to_data_name("dodoma", link_cols=c("year"))

data_book$convert_column_to_type(data_name="dodoma", col_names="year", to_type=year_type)
data_book$convert_column_to_type(data_name=linked_data_name, col_names="year", to_type=year_type)
rdstern commented 1 week ago

@lilyclements and @MeSophie I suggest the column with the censored values replaced by the last day is additional to the variable with NA in that position. And it is after the status variable and is only given if there are some years where it is needed. (Unless that makes it difficult for the export to google buckets.)

Here is an example from testing the length:

image

So I'm suggesting it would add a further variable at the end of those shown. I call it the "filled-in" column. Now, in R-Instat, it is quite easy to produce, but I assume the filled-in column is what is needed for export (together with the status. Then the filled in column is used in the graphs, and there might be a small up-arrow, for each of those years, at the top of the graph that indicated it is more than that length (or end).

We will also need to add this to the PICSA rainfall graphs..

MeSophie commented 1 week ago

@lilyclements Please I don't understand well this part image Does the Last Doy always come from the ucrReceiverDoy? If so, the code will become yes=doy_366. If not, where do we take the number received by yes? Following the full example of code you gave yes is taking the last doy define in the day range (I hope I have understood correctly).

rdstern commented 1 week ago

@MeSophie that's correct that it takes the last day in the day rainge. That's the last day when the end, could have happened. So, in those years where it didn;t happen in the tange you gave, it must have happened after that last day.

lilyclements commented 1 week ago

@MeSophie you have understood what I wrote correctly, but I think I misunderstood.

@rdstern, just to clarify, if the end of seasons did not happen in the day range given, do you want it to give two columns then: The end of season DOY, and the "filled in" column, where the "filled in" column is

a) The last day in the filter ("Last-DOY") and status as FALSE, or, b) The last day in the filter + 1 ("Last-DOY + 1") and the status as FALSE c) Or something else?

@MeSophie if we want to have two columns, then we modify the code to:

end_season <- instat_calculation$new(type="summary", function_exp="ifelse(test=is.na(x=dplyr::first(x=wb)), yes=NA, no=dplyr::first(x=doy_366))", result_name="end_season", calculated_from=list("dodoma"="doy_366"), save=2)

end_season_filled <- instat_calculation$new(type="summary", function_exp="ifelse(test=is.na(x=dplyr::first(x=wb)), yes= LAST_DOY, no=dplyr::first(x=doy_366))", result_name="end_season_filled", calculated_from=list("dodoma"="doy_366"), save=2)

[...]

end_of_season_combined <- instat_calculation$new(type="combination", manipulations=list(conditions_filter, grouping_by_station_year, doy_filter), sub_calculations=list(end_season, end_season_filled, end_season_date, end_season_status))

@rdstern does this mean we would always give this "filled in" column now? How about instead having a checkbox to say to "Infill missing DOYs with last DOY" (/something more eloquent!)

rdstern commented 1 week ago

@MeSophie and @lilyclements I suggest

a) We always give the not-filled-in column - as we do now in the Length dialog. b) We fill in with the Last-DOY when needed. c) We only add the filled-in column when there is something to fill in. I suggest we then add it anyway, and after the status variable.

lilyclements commented 1 week ago

@rdstern great. In which case, @MeSophie we want to modify the code to have two columns. This would be:

end_season <- instat_calculation$new(type="summary", function_exp="ifelse(test=is.na(x=dplyr::first(x=wb)), yes=NA, no=dplyr::first(x=doy_366))", result_name="end_season", calculated_from=list("dodoma"="doy_366"), save=2)

end_season_filled <- instat_calculation$new(type="summary", function_exp="ifelse(test=is.na(x=dplyr::first(x=wb)), yes= LAST_DOY, no=dplyr::first(x=doy_366))", result_name="end_season_filled", calculated_from=list("dodoma"="doy_366"), save=2)

[...]

end_of_season_combined <- instat_calculation$new(type="combination", manipulations=list(conditions_filter, grouping_by_station_year, doy_filter), sub_calculations=list(end_season, end_season_date, end_season_status, end_season_filled))