bluefoxr / COINr

COINr
https://bluefoxr.github.io/COINr/
Other
25 stars 8 forks source link

Treat data using indiv_specs fails to treat the specified indicator #53

Closed Petrina95 closed 6 months ago

Petrina95 commented 1 year ago

Hello..!

I have a set of indicators, that have outliers and need to be treated. Some of them fail on skew and kurotsis test, so they are treated using the default Treat() function.

However, I have two more indicators, that visually have outliers but the skew does not exceed the limit, only kurtosis does, so these indicators are not treated by the default function. So, I create indiv_specs as follows:

indiv_specs <- list( IND_13 = list( f1_para = list(winmax = 1, force_win = TRUE) ) )

where I specify that I want 1 point winsorized for indicator 13, without checking the limits of skew and kurtosis.

When I apply the Treat function, passing indiv_specs to indiv_specs argument, however, indicator 13 remains untreated.

CS_coinr <- Treat(CS_coinr, dset = "Raw", indiv_specs = indiv_specs)

Is there an issue with my code, or is it a bug?

*I also tried the treat() function of COINr6 for indicator 13, and the treatment worked.

Thank you in advance, Petrina

bluefoxr commented 1 year ago

Hi Petrina, sorry I have taken a long time to look at this. I believe this is a bug but I am in a very busy period. When I get chance, I will fix this.

Petrina95 commented 1 year ago

Hello Will,

Thank you very much for the response and for your work in general! No worries, I have found a workaround using this function from COINr6 and then converting the COIN to coin using COINr::COIN_to_coin() function.

Thank you once again, Petrina

bluefoxr commented 1 year ago

Ok great, thanks anyway for flagging this, I will leave the issue open as it needs fixing.

bluefoxr commented 6 months ago

Hi sorry for taking a while to look at this. I checked the underlying behaviour of the Winsorise() function, and the force_win argument seems to work. Can you send me a reproducible example so I can reproduce the problem? I would need:

You can attach the coin here or send to my email william.becker@bluefoxdata.eu. Or if you prefer not to share your data please reproduce the error using the example coin generated from build_example_coin(). Thanks.

Ptroulitaki2023 commented 6 months ago

Hello again..! So, I will use some example data for privacy reasons. I run the following commands:

example_coin<-build_example_coin(up_to = "new_coin")
indiv_specs <- list(
  LPI = list(
    f1 = "winsorise",
    f1_para = list(winmax = 2,
                   skew_thresh  = NA, 
                   kurt_thresh  = NA,
                   force_win =TRUE)
  )
)

example_coin <- COINr::Treat(example_coin, dset="Raw",global_specs="none", indiv_specs = indiv_specs)

The results I get when looking at the example_coin$Analysis$Treated$Dets_Table are the following: image

from which I understand that the indicator I specified which is LPI, was not flagged for winsorisation and on the contrary, other indicators are flagged. Moreover, I also check example_coin$Analysis$Treated$Treated_Points and I get the following:

image

from which I understand again that LPI was not treated but Goods for example, among others, did, even though I did not specified it. I also checked the actual data in case the winsorisation was happening and it was only a logging issue in example_coin$Analysis$Treated but again no changes happened to the data for indicator LPI.

image

Something similar is happening with my data.

I am also attaching the coin in excel format ( example-coin.xlsx). Please let me know if it is more convenient to attach it in another format.

Thank you very much, Petrina

bluefoxr commented 6 months ago

Hi so I have had a look at this this morning. Unfortunately the treat() function in COINr is complex due to the attempt to accommodate different treatment and outlier detection functions, so problems can occur.

In your case actually the function is working as it is supposed to be however. The reason is that the first check in the function is the "pass" function - if this fails, it goes to "f1". Since the pass function is still looking for skew and kurtosis within the thresholds, it passes LPI in your example, and never actually gets to f1 at all. You have to ensure that (a) LPI doesn't pass the pass function, and (b) it is forced to winsorise when it gets to f1, AND (c) nothing happens to it in f2.

So a way to do this is as follows:

example_coin <- build_example_coin(up_to = "new_coin")

f_dont_pass <- function(x){FALSE}
f_dont_treat <- function(x){x}

indiv_specs <- list(
  LPI = list(
    f1 = "winsorise",
    f1_para = list(winmax = 2,
                   skew_thresh  = NA, 
                   kurt_thresh  = NA,
                   force_win =TRUE),
    f_pass = "f_dont_pass",
    f_pass_para = NULL,
    f2 = "f_dont_treat",
    f2_para = NULL
  )
)

example_coin <- COINr::Treat(example_coin, dset="Raw",global_specs="none", indiv_specs = indiv_specs)

This is a bit hacky and ideally the treat() function would be reformulated to make this kind of thing easier. But unfortunately I won't have the time to do this any time soon, so see if you can use this workaround for the moment. Let me know if you find any other issues.

bluefoxr commented 6 months ago

p.s. I did spot a bug while doing this so you will have to install the development version of COINr for this to work:

devtools::install_github("bluefoxr/COINr")
Ptroulitaki2023 commented 6 months ago

Hi! Thank you very much for the response.

I was working on it in parallel and I was able to find a workaround.

In general, I should have mentioned that I use IQR criterion for outliers, so what I wanted to do, after I had already identified the outliers with this criterion, was to force winsorisation in specific indicators. What I did actually is very similar to your approach and you already have an example in the documentation:

outlier_pass <- function(x){
  # return FALSE if any outliers
  !any(check_outliers( na.exclude(x), method = "iqr", threshold=1.5))
}

indiv_specs <- list(
  IND_1 = list(
    f1_para = list(winmax = 1,
                   force_win =TRUE),
    f_pass = "outlier_pass",
    f_pass_para = NULL    
  ),
  IND_2 = list(
    f1_para = list(winmax = 3,
                   na.rm = TRUE,
                   force_win =TRUE),
    f_pass = "outlier_pass",
    f_pass_para = NULL    
  ),
  IND_3 = "none"
)

# now call treat(), passing this function
# we set f_pass_para to NULL to avoid passing default parameters to the new function
coin <- Treat(coin , dset = "Raw",
                       global_specs = "none",
                       indiv_specs = indiv_specs
)

The logic here I think is the one you mention. However, the truth is that I would expect when global="none" that no winsorisation takes place. In this example, IND_3 is still flagged from skew and kurtosis, and the only way to avoid the treatment was to insert it in the indiv_specs as shown.

Thank you once again for your help. I will also re-install COINr to take the change! Petrina