bd2kccd / r-causal

R Wrapper for Tetrad Library
35 stars 19 forks source link

Is it possible to get the score of each training data? #91

Closed amber0309 closed 5 years ago

amber0309 commented 5 years ago

It seems the graph object (returned by tetradrunner) contains only the model and nodes scores. I wonder whether there are any ways to also get the scores of data in R?

I am running continuous fGES with 'sem-bic-score'. Thanks in advance!

chirayukong commented 5 years ago

It should be possible the same way as the py-causal. I'll let you know then.

chirayukong commented 5 years ago

This code should do the work.

data("charity")    #Load the charity dataset
tetradrunner <- tetradrunner(algoId = 'fges',df = charity,scoreId = 'sem-bic', dataType = 'continuous',faithfulnessAssumed=TRUE,maxDegree=-1,verbose=TRUE)    #Compute FGES search
tetradrunner$nodes #Show the result's nodes
tetradrunner$edges #Show the result's edges

graph <- tetradrunner$graph
graph$getAttribute('BIC')

nodes <- graph$getNodes()
for(i in 0:as.integer(nodes$size()-1)){
    node <- nodes$get(i)
    cat(node$getName(),": ",node$getAttribute('BIC'),"\n")
}

PS: you need to get the new code bz I updated the jar file.

amber0309 commented 5 years ago

This code should do the work.

data("charity")    #Load the charity dataset
tetradrunner <- tetradrunner(algoId = 'fges',df = charity,scoreId = 'sem-bic', dataType = 'continuous',faithfulnessAssumed=TRUE,maxDegree=-1,verbose=TRUE)    #Compute FGES search
tetradrunner$nodes #Show the result's nodes
tetradrunner$edges #Show the result's edges

graph <- tetradrunner$graph
graph$getAttribute('BIC')

nodes <- graph$getNodes()
for(i in 0:as.integer(nodes$size()-1)){
    node <- nodes$get(i)
    cat(node$getName(),": ",node$getAttribute('BIC'),"\n")
}

PS: you need to get the new code bz I updated the jar file.

Thanks very much for your reply! I tried the code and here is the output of the last part (for loop).

TangibilityCondition : 120.2197 AmountDonated : -65.74022 Sympathy : -129.1736 Imaginability : -128.6085 Impact : -105.6724

Results above are the BIC scores of all nodes but I am interested in that of each sample point. Suppose there are 100 observations in the data frame, I would like to get the score of each observation in the learned graph. Thus, there should be 100 scores, each of which corresponds to an observation in the data frame.

Is there any way to achieve this in R or python? Thanks in advance!

chirayukong commented 5 years ago

@jdramsey could we calculate BIC for each of samples we observe?

jdramsey commented 5 years ago

@chirayukong No, sorry, BIC scores are for datasets, not individual samples. It's true that individual samples can have likelihoods, but the BIC score scores all samples together in aggregate. There's a (new) class called SemBicScorer to do the scoring.

You'd think you could just use the SemBicScorer with a sample of one. The problem is, what the SEM BIC score does first calculates the variance of the residual. With just one sample, the residual with have a variance of zero. In the formulas, a BIC score where the residual has a variance of zero is infinity, which isn't helpful at all.

amber0309 commented 5 years ago

Thanks for your reply!