EdwinTh / padr

Padding of missing records in time series
https://edwinth.github.io/padr/
Other
132 stars 12 forks source link

Making custom start_val and end_val for pad on grouped data #86

Closed TriageDr closed 2 years ago

TriageDr commented 2 years ago

Hello,

Really like to tool but am having some trouble with grouped data.

df %>% group_by(id, lab_measurement) %>% pad( 'day', start_val = as.POSIXlt('2020-04-04'), # ? Can I pass a variable from another data-frame here? end_val = as.POSIXlt('2020-05-11') # ? Can I pass a variable from another data-frame here? )

I have lab measurements for patients that are not equivalent number/frequency.

IE -- Lab A has 24 measurements and Lab B has 5 measurements.

I would like to have a normalized output of 24 measurements for Lab A and 24 measurements for Lab B (5 real, 21 NAs).

When using grouped data, the pad function finds the date floor for each lab, instead of the "global" date floor for all measurements (makes sense). In order to get around this, the pad() function requires you to manually input the string for the "floor" (start_val) and "ceiling" (end_val).

I would like to simply pass those dates as strings from separate dataframe - however nothing appears to work.

Apologies in advance for poor psuedo-code.

EdwinTh commented 2 years ago

I am not sure I get your question right, but the start_val and end_val expect a single value of class 'Date', 'POSIXlt' or POSIXct. You can fetch those from another dataframe, as long as you make sure it is in the form the function expects it to be. If you have a datetime variable called dt in dataframe x then x$dt[1] would work.

Please not that you can only set the start_val and end_val universally for all groups. If you want to use a custom value for each group you should do some wrapper programming, for instance by using the purrr package.

TriageDr commented 2 years ago

I really appreciate your reply -- I think I've resolved this by using a for-loop to iterate through the custom floor dates in a similar manner to x$dt[i].

I was also able to achieve this with the tidyr::complete function.

complete( date_time_start = seq(min(date_time_start), max(date_time_start), by = "day"), nesting(patient_id, lab_measurement), fill = list(NA) )

Really wonderful package and thanks again for the response!