ericstrobl / RCIT

The Randomized Conditional Independence Test (RCIT) and the Randomized conditional Correlation Test (RCoT)
24 stars 10 forks source link

RCIT/KCIT for causal discovery with mixed data #1

Open MaxKerney opened 4 years ago

MaxKerney commented 4 years ago

Hi,

I've been told that KCIT (and therefore I presume RCIT/RCoT) can be used with mixed continuous and discrete data. However, playing around with the package this doesn't seem to work. Is there something I need to adjust to make the tests work with mixed data?

Also, if using mixed data is possible, how could KCIT or RCIT then be used with an algorithm like FCI for causal discovery? The code showing how this was done for the causal discovery experiments in your paper doesn't seem to be available in this repo. Could KCIT or RCIT be used as a CI test in the FCI function from pcalg, for instance?

Many thanks.

ericstrobl commented 4 years ago

Is there something I need to adjust to make the tests work with mixed data? You should try to binarize the discrete data. So if a discrete variables takes on k values in the dataset, then you subtitute that variable with k-1 binary variables. The reason why you do this is to simplify functional relationships.

Also, if using mixed data is possible, how could KCIT or RCIT then be used with an algorithm like FCI for causal discovery? ...Could KCIT or RCIT be used as a CI test in the FCI function from pcalg, for instance? Yes, but you need to write a wrapper function like the following where suffStat is a list containing the data:

RCIT_wrap <-function(x_index,y_index,z_index,suffStat){ out = RCIT(suffStat$data[,x_index],suffStat$data[,y_index],suffStat$data[,z_index]) return(out$p) }

MaxKerney commented 4 years ago

Thanks! Unfortunately I'm running up against some errors with that though. Firstly, an error Error in var(if (is.vector(x) || is.factor(x)) x else as.double(x), na.rm = na.rm) : is.atomic(x) is not TRUE (which was this error when using my actual data: Error: (list) object cannot be coerced to type 'double') suggested that "suffStat$data[,x_index]" etc. needed to be "unlisted", so I did that:

testdata <- select_if(mtcars, is.numeric)
RCIT_wrap <- function(x_index, y_index, z_index, suffStat) {
    out = RCIT(unlist(suffStat$data[,x_index]), unlist(suffStat$data[,y_index]), unlist(suffStat$data[,z_index]))
    return(out$p)
    }
suffStat <- list(data = testdata)
res <- pcalg::fci(suffStat, indepTest = RCIT_wrap,
                  alpha = 0.9999, labels = names(testdata))

But now I'm getting the error: Error in cbind(y, z) : number of rows of matrices must match (see arg 2) and I'm not sure how to resolve that.

Also, is there any way of analysing discrete variables without having to binarize them? When I spoke to Kun Zhang about it before he said something about needing to use the delta kernel or a Gaussian kernel with a very small kernel width for mixed data.

Angela446-lgtm commented 2 years ago

Hello, @MaxKerney I am working on the same thing. I am using RCIT as a CI test in the FCI function from pcalg-however I runn into the same errors. Do you know the solution of it?

Thanks Angela

MaxKerney commented 2 years ago

Hi @Angela446-lgtm,

I'm afraid not, I ended up using a different causal discovery method instead (https://github.com/Biwei-Huang/Generalized-Score-Functions-for-Causal-Discovery)

Max

Angela446-lgtm commented 2 years ago

Ok!Thank you for your quick reply to my question.

ericstrobl commented 2 years ago

Angela,

Sorry for these errors. People have gotten this error in the past when they pass a data frame into RCIT as opposed to a matrix, or one of their variables has zero variance.

If those two dont solve it, hopefully you can send me your data and your code that causes the errors, then i should be able to solve the issue. It will help others experiencing the same problem, since i can update the code accordingly

Angela446-lgtm commented 2 years ago

Indeed, now I am passing a matrix and not a dataframe and eveything works fine. Thank you.