cmu-phil / tetrad

Repository for the Tetrad Project, www.phil.cmu.edu/tetrad.
GNU General Public License v2.0
402 stars 110 forks source link

Model fitness for instantiated models #22

Closed igormk closed 8 years ago

igormk commented 8 years ago

I have a data of 20,000 records, my Bayes Parametric model contains latent variables and that is why I used EM Bayes Estimator to find an estimate of the parameters of the model. The problem is, the running time is very long - I waited few hours before I stopped the learning process. I have found other software (GeNIe, https://dslpitt.org/genie/) which can be used to estimate the parameters of the model for a given data and I was able to find an estimate of the parameters of my model for a shorter time. I have manually inserted the parameter values in the component "Instantiated model", however, I was not able to find a functionality to estimate the model fitness (P-value) so that I can know how good is my model. Could you please tell me whether this type of functionality exists in Tetrad?

jdramsey commented 8 years ago

From igormk--Considering the tetrat file showing the simple 4-variable (1 latent) model, I have turned on the logging, run it, and after 3-4 minutes I received output (sent in attachment), however the Dialog window "Executing..." was still active.

jdramsey commented 8 years ago

========LOGGING Estimator1

EM-Estimated Bayes IM:

Node: X1

L1
0 0.1459 0.8541
1 0.9686 0.0314
2 0.5949 0.4051

Node: X2

L1
0 0.6875 0.3125
1 0.0306 0.9694
2 0.9560 0.0440

Node: X3

L1
0 0.3247 0.6753
1 0.6725 0.3275
2 0.0488 0.9512

Node: L1

0.0823 0.1828 0.7349

jdramsey commented 8 years ago

I did not work on that class (except maybe to clean up the code a little). I recall David Danks telling me that it would only work for very small models. The fact that the execute button continues to run is interesting--I might be able to figure out why that is.

jdramsey commented 8 years ago

Hi Igor,

Do you by chance have a cyclic model? Also, could you tell me how many variables are in your model and how many parents and children each of them has? Also, could you tell me how many categories each variable has?

A cyclic model could cause the process to hang. So far I'm getting answers out of the EM Bayes Estimator with non-cyclic models, with simple 5 variable binary models, but if you could be more specific I could adjust my simulation.

Thanks,

Joe

jdramsey commented 8 years ago

Never mind--I wasn't including latents. With a latent it hangs--OK I'll try to profile that.

jdramsey commented 8 years ago

I'm just trying to estimate a simple binary model of the form X<-(L)->Y. The hangup is the method that initializes tables in the ML Bayes IM. It was using a more sophisticated algorithm that was slowing it down. When I switch to a simpler algorithm, the process finishes.

So that raises the question, what randomization method should the method use, and why? I'll have to ponder that. I suppose the easy answer is that it should use the simple randomization because it's faster. But this reduces the effectiveness of searches. Hmm...

jdramsey commented 8 years ago

The answer is, for the EM Bayes estimator, it doesn't matter; the initializations are just for show; they are overwritten with updated values. Switching the initialization method in RowSummingExactUpdater for the createUpdatedBayesIm method to MANUAL, to bypass this initialization. Reverting MlBayesIm.

That seems to do the trick.

I'll push this change up to the master branch. Not sure when it will get published but hopefully soon. (Sorry, still working out the details on that.)

Closing the issue.