cmu-phil / py-tetrad

Makes algorithms/code in Tetrad available in Python via JPype
MIT License
50 stars 9 forks source link

Request: Include Matrix-Style Output Format in run_ica_lingd() Function for LiNG-D Algorithm #22

Closed bluejw closed 4 months ago

bluejw commented 4 months ago

Upon running the function run_ica_lingd() on simulated datasets, I found that the current output only consists of printed versions of the adjacency matrices. While this provides the necessary information, it is hard to extract summary statistics based on multiple numerical experiments and perform downstream analysis due to the lack of a structured output format. So I am wondering whether it would be possible to enhance the output format of run_ica_lingd(), perhaps by including a matrix-style output in either R or Python? Thanks!

jdramsey commented 4 months ago

I suppose I could do that :) What format exactly? How would you parse the output?

jdramsey commented 4 months ago

I suppose I could put the beta hat matrices in separate files, one per file, or would it be better to put them all in the same file and let you parse them? Not sure...

bluejw commented 4 months ago

Either way would be good as long as I can get a version of the beta hat matrices in a format that can be read by the R code (e.g., written as a matrix or array) for further analysis :)

jdramsey commented 4 months ago

Oh, I see. Yeah, that should be possible.

jdramsey commented 4 months ago

Are you using rpy-tetrad? Or did you want to read the beta hat matrices in from files?

bluejw commented 4 months ago

Yes, I am using rpy-tetrad and I would like to read the beta hat matrices from files.

jdramsey commented 4 months ago

OK I'm going to try to do this today. I was caught in another project for the past week, sorry.

jdramsey commented 4 months ago

Is it OK if I return a list of matrices of data frames from rpy-tetrad?

I could write some code to write them to files, but maybe you would prefer to do that?

Give me half an hour.

bluejw commented 4 months ago

Yes, it is OK to return a list of matrices of data frames from rpy-tetrad. Thanks!

jdramsey commented 4 months ago

OK it should work from Python now; let me see if I can make it work from R...

jdramsey commented 4 months ago

Oboy, I just realized I wrote code to return the bhat matrices as numpy arrays, which I think would be perfectly fine for you, but I need to make sure (as I said) that R will recognize these and that the variable names are available. I know R using rpy2 can recognize pandas arrays natively; maybe I should convert to a pandas array instead. Let me think.

jdramsey commented 4 months ago

I'm sorry this took me longer than I planned (because I'm a bad programmer today), but here is a test in Python that comes out perfectly. You can always get the array without the variable names from the data frame, so I think this is a good arrangement. Let me test LiNG-D; I did it the same way, though in that case,, multiple graphs may be returned.

ICA-LiNGAM
Cycle found at node 0.
Graph Nodes:
Frequency;Attack;Chord;Velocity;Displacement;Pressure

Graph Edges:
1. Chord --> Attack
2. Chord --> Pressure
3. Chord --> Velocity

Graph Attributes:
BIC: -11485.394800

Graph Node Attributes:
Score: [Frequency: -12029.13137107714;Attack: -1810.9464082098943;Chord: 3591.373280558362;Velocity: -4126.523775371914;Displacement: 6509.985397382994;Pressure: -2394.472219057702]

bhat:
   Frequency  Attack     Chord  Velocity  Displacement  Pressure
0        0.0     0.0  0.000000       0.0           0.0       0.0
1        0.0     0.0  0.903148       0.0           0.0       0.0
2        0.0     0.0  0.000000       0.0           0.0       0.0
3        0.0     0.0  1.444809       0.0           0.0       0.0
4        0.0     0.0  0.000000       0.0           0.0       0.0
5        0.0     0.0  1.377580       0.0           0.0       0.0
jdramsey commented 4 months ago

The LiNG-D test came out perfectly. OK, let me double-check to make sure this works in R.

jdramsey commented 4 months ago

Works! OK, I will do additional testing and commit this to py-tetrad GitHub. The Tetrad jar was updated, as well as a few Python modules.

jdramsey commented 4 months ago

OK, it should work now! Do a

git pull

In the py-tetrad directory to get all the changes, and then in R, follow the example in the (new)

sample_r_code10.R

Let me know if you have problems.

jdramsey commented 4 months ago

Also if this works for you you can go ahead and close the issue.

bluejw commented 4 months ago

Thank you for the update! It solves my problem perfectly!

jdramsey commented 4 months ago

That's so strange; I just had that same problem on this laptop I'm using now. The workaround I did was to load up my project in IntelliJ Ultimate, which I have on this laptop, and it works. But in PyCharm (which is also a Jetbrains project related to IntelliJ IDEA) it wouldn't work. It's very odd. I got the same error message, and when I printed out the message for the error it gave this error:

FileNotFoundError: [Errno 2] JVM DLL not found: /Library/Java/JavaVirtualMachines/amazon-corretto-17.jdk/Contents/Home/lib/libjli.dylib

Of course, when I go to that location, the file does, in fact, exist and is obviously executable because it works in IntelliJ. Actually let me double-check something.

Are you on a Mac? You may have told me before, sorry.

bluejw commented 4 months ago

Yes, I am on a Mac. And I just solved the issue by restarting my R session... But I am not sure about the reason.

jdramsey commented 4 months ago

Huh.

jdramsey commented 4 months ago

You're right, and I double-checked to make sure RStudio was using that JDK by deleting it and restarting R (wouldn't work) and then re-installing it and restarting R (works).