Closed liuyanguu closed 1 month ago
Hi @liuyanguu, thanks for bringing this up.
I'm not able to reproduce the problem you describe though, or perhaps misunderstanding you. My understanding is that you are not able to impute if there is only one non-NA value.
For Impute()
we have the three options as I think you know: "latest", "constant", and "linear". The latter two use stats::approx()
. The first two are pretty similar (with some small differences described in the documentation), and just copy the latest or earliest values to missing points. The "linear" option however performs a linear interpolation. Logically, for the first two, we only need one non-NA point, whereas for "linear" we would need at least two, since you can't do linear regression with one point.
So following your example:
dfp <- ASEM_iData_p
# remove all points for GBR/LPI except 2019
dfp$LPI[dfp$uCode == "GBR" & dfp$Time %in% c(2018, 2020:2022)] <- NA
# make purse with fresh panel data
ASEMp <- new_coin(dfp, ASEM_iMeta, split_to = "all", quietly = TRUE)
# extract and plot time series for GBR for indicator LPI
df_plot <- get_data(ASEMp, dset = "Raw", iCodes = "LPI", uCodes = "GBR")
plot(df_plot$Time, df_plot$LPI, type = "b", xlab = "Year", ylab = "LPI for GBR")
# confirms only one point in GBR/LPI time series...
# impute- -----
# imputation with "latest"
ASEMp <- Impute(ASEMp, dset = "Raw", f_i = "impute_panel", f_i_para = list(imp_type = "latest"))
df_plot <- get_data(ASEMp, dset = "Imputed", iCodes = "LPI", uCodes = "GBR")
plot(df_plot$Time, df_plot$LPI, type = "b", xlab = "Year", ylab = "LPI for GBR")
I get:
# imputation with "constant"
ASEMp <- Impute(ASEMp, dset = "Raw", f_i = "impute_panel", f_i_para = list(imp_type = "constant"))
df_plot <- get_data(ASEMp, dset = "Imputed", iCodes = "LPI", uCodes = "GBR")
plot(df_plot$Time, df_plot$LPI, type = "b", xlab = "Year", ylab = "LPI for GBR")
I get:
# imputation with "linear"
ASEMp <- Impute(ASEMp, dset = "Raw", f_i = "impute_panel", f_i_para = list(imp_type = "linear"))
df_plot <- get_data(ASEMp, dset = "Imputed", iCodes = "LPI", uCodes = "GBR")
plot(df_plot$Time, df_plot$LPI, type = "b", xlab = "Year", ylab = "LPI for GBR")
Here I get an error, but this is what I would expect since we cannot do linear interpolation with 1 point. Not sure if you were expecting to be able to impute with the linear method on one point? If so I would just say to switch to "constant", or else if I misunderstood, let me know.
For the last question, the function you are looking for is get_dset()
I think. On the example above:
dfi <- get_dset(ASEMp, dset = "Imputed")
If you want the group variables etc, use the also_get
argument.
Sorry - actually I think I might have understood your meaning - you don't want the imputation to fail as a result of one or more time series that may only have one observations - whereas there may be others where the imputation is possible. I see your point. Let me get back to you.
Yes, that's what I meant! :) And in such one-observation case, I would suggest just making it constant.
Sorry - actually I think I might have understood your meaning - you don't want the imputation to fail as a result of one or more time series that may only have one observations - whereas there may be others where the imputation is possible. I see your point. Let me get back to you.
OK so I decided to include a new imp_type which reverts to the "constant" method for one-observation series. So you should now run:
ASEMp <- Impute(ASEMp, dset = "Raw", f_i = "impute_panel", f_i_para = list(imp_type = "linear-constant"))
Thank you so much Will! But I get a very strange situation: the new code does work with the example above and everything looks good. But it just doesn't work with my data. It gives the error directly even without those messages:
Error in stats::approx(x, y, xout = x, rule = 2, method = imp_type) :
need at least two non-NA values to interpolate
If imp_typ = constant
it works normally, but doesn't work with linear
or linear-constant
I tried many ways and cannot figure out why ... I have attached the purse rds, could you kindly give a try?
LCI_raw.zip
LCI_raw <- readRDS("LCI_raw.rds")
LCI_imp <- Impute(LCI_raw, dset = "Raw", f_i = "impute_panel", f_i_para = list(imp_type = "linear-constant"))
Many thanks!!
Hi, yes my bad, I made a little mistake in the code. I have now fixed this - please try the latest version and it should work.
I confirm this is fixed. Thank you so much, @bluefoxr, for accommodating our special request! Much appreciated.
Thank you so much, Will, for the great package. We (the UNICEF data team) are developing a new index and want to transfer it into the
COINr
framework.I noticed a minor bug in the
stats::approx
part, asstats::approx
cannot run if there is only one non-NA value. https://github.com/bluefoxr/COINr/blob/0d45107b5358a7dbfa7c52a1921ba37035f9bba0/R/impute.R#L1068An easy fix is like below. Note that I impute constant value using the only non-NA value if this is the case:
Please let me know if you would like me to create a pull request --- or maybe you could just easily fix it.
Here is a quick reproducible example:
After fixing, the output will be like:
In the end, I have a quick question: is there a function that can convert the purse back to the panel data? Basically reverse the following process:
Because now I have added "Imputed" dataset to every coin, wonder if there is a quick way to reconstruct the panel data using the imputed datasets? I wonder after creating the Imputed dataset,