cmu-phil / tetrad

Repository for the Tetrad Project, www.phil.cmu.edu/tetrad.
GNU General Public License v2.0
398 stars 110 forks source link

Replicate results from Sanchez-Romero et al. (2019) with the FASK algorithm #1767

Closed JMUB closed 3 months ago

JMUB commented 3 months ago

I have been trying to replicate the results reported in the paper by Sanchez-Romero et al. (2019) "Estimating feedforward and feedback effective connections from fMRI time series: Assessments of statistical methods" https://direct.mit.edu/netn/article/3/2/274/2211/Estimating-feedforward-and-feedback-effective

In particular, I want to replicate the results obtained for the left-hemisphere resting-state fMRI using the FASK algorithm (Fig. 6 in the paper). I am using the data provided in the supplementary material available here: https://cmu.app.box.com/s/7jq6uucz3raceinnrkzf25tisy4jwe0l/folder/44148019136 .

The paper states that the results were obtained setting the following parameters:

However, the current FASK implementation has two additional parameters: faskDelta and twoCycleScreeningThreshold.

The paper does not mention the values for these two parameters. When I run FASK using the default values faskDelta= 0 and twoCycleScreeningThreshold= 0 , the algorithm fails to identify the 2-cycles reported in the paper.

I noticed that the detection of cycles is sensitive to the value given to twoCycleScreeningThreshold. Unfortunately, the supplementary material of the paper makes no mention to this parameter.

Does anybody know the value for these parameters necessary to replicate the results? Or does anybody has guidelines to set these parameters?

I am using the Tetrad java executable version 7.6.3. Below you can see the complete FASK parameters in am currently using.

Thanks in advance for any help.

FASK_parameters

jdramsey commented 3 months ago

Someone else had asked about this, and Ruben Sanchez and I had figured out the answer. Let me go back and find that answer and send it to you.

cg09 commented 3 months ago

Send it to me as well.

On Wed, May 8, 2024 at 8:16 AM Joseph Ramsey @.***> wrote:

Someone else had asked about this, and Ruben Sanchez and I had figured out the answer. Let me go back and find that answer and send it to you.

— Reply to this email directly, view it on GitHub https://github.com/cmu-phil/tetrad/issues/1767#issuecomment-2100445381, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD4Y3ONCGVVP5IQY2MNV7Y3ZBIJTVAVCNFSM6AAAAABHMZ45NKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMBQGQ2DKMZYGE . You are receiving this because you are subscribed to this thread.Message ID: @.***>

JMUB commented 3 months ago

I found related information in the causal-cmd repository, where somebody tried to replicate the results of the macaque data: https://github.com/bd2kccd/causal-cmd/issues/69#issuecomment-1439165483

There it is recommended to set twoCycleScreeningThreshold to 0. It is also mentioned that this parameter was not used in the original paper implementation.

When I set twoCycleScreeningThreshold = 0, FASK fails to detect the 2-cycles in the the resting state data. I only obtain similar edge frequencies and 2-cycles to those reported in the paper by setting twoCycleScreeningThreshold = 0.05. This value was just a lucky guess.

Is there a set of parameters that yield an exact replicate of the results?

jdramsey commented 3 months ago

Oh, I'm so sorry! This slipped my mind in all the end-of-semester goings-on.

Let me work on this a bit over the next several days. Please don't let me forget. The issue is that the Tetrad suite has developed quite a bit since that paper. But I should be able to look back in the repository to that point in time and see where the relevant code has diverged.

jdramsey commented 3 months ago

OK, I'm going to step away from my current project for a few hours and do this. I think I know how to go about it. We wrote a tech report analyzing the Sachs data here:

https://arxiv.org/abs/1805.03108

I need to figure out how to reproduce those results, and then I think I'll have answered your question.

Sorry, it's been a very long time since I've worked on FASK; I realize it's my algorithm, but I need to get my head back in that game.

jdramsey commented 3 months ago

I fooled around with this for a while this afternoon and couldn't figure it out, so I went back into the repository to find the version of FASK that we used for the Sachs report and added it to my branch as "FASK-Orig" as an experimental algorithm. Sure enough, the analysis is replicated.

This is the graph I just got:

image

This is the published:

image

Take a look at these graphs and see if they look identical to you. What I did was look at the date of publication of the Sachs report and go into the repository to find the version of FASK from that date.

I have to go home for dinner soon, but I need to look at the date of publication of the feedback paper you were referencing to see if that version of FASK is different from this. I don't think it is.

I have some work to do to fix the current FASK, which I'll probably do tomorrow. It's gotten out of sync a little bit in a way that's not obvious to my "naked eye," and the old one is better for Sachs.

If this works out, what I can do is send you a snapshot build of Tetrad with the repairs to try, though if I make the fixes, I'd rather just include them in the next published version.

jdramsey commented 3 months ago

Hmm... there is actually one edge difference between the two graphs, which I'm not going to worry about just now.

jdramsey commented 3 months ago

Looking at the dates, the Sachs report is from May 2018, while the Feedback report is from February 2019. We worked on the Sachs report until we published it on arXiv, whereas the Feedbacks paper had to undergo a lengthy review, so I believe the versions of FASK are identical. Let me dive back into the repository. The next commit on FASK after May 2018 was April 2019, so we have a winner, this version!

OK, so that's settled. Alright, either overnight or tomorrow I will reconcile the versions.

jdramsey commented 3 months ago

OK, I've cleaned it up. I removed some unnecessary options; the current parameters that work for Sachs are these:

image

I'll review the Feedback paper (it's been a while) and see if these parameters work. I believe they will.

I need to clean up some other stuff, but I can send you a version of Tetrad with these changes. As I said, I'll aim to include them in the next release.

cg09 commented 3 months ago

Joe,

I think the guy was interested in reproducing some of ruben's studies, not Sachs.

On Thu, May 16, 2024 at 8:53 PM Joseph Ramsey @.***> wrote:

OK, I've cleaned it up. I removed some unnecessary options; the current parameters that work for Sachs are these: image.png (view on web) https://github.com/cmu-phil/tetrad/assets/9853255/d27faefc-3d3b-46dc-8d5a-55311f8c77da

I'll review the Feedback paper (it's been a while) and see if these parameters work. I believe they will.

I need to clean up some other stuff, but I can send you a version of Tetrad with these changes. As I said, I'll aim to include them in the next release.

— Reply to this email directly, view it on GitHub https://github.com/cmu-phil/tetrad/issues/1767#issuecomment-2116439826, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD4Y3OLGY3CEYYPHK5PTLHDZCVIH7AVCNFSM6AAAAABHMZ45NKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJWGQZTSOBSGY . You are receiving this because you commented.Message ID: @.***>

jdramsey commented 3 months ago

There is a method to my madness, Clark. Sachs is an example from the same period that I know the specific answer to. Once this one works, the results in that paper should be reproducible, as they use the same algorithm. I retrieved the specific code from the repository for FASK that was used in both of those papers.

I've taken advantage of the fact that we now have better programming tools (with AIs) that let me clean up the code. Let me elaborate on the issue with the parameters.

OK?

On Thu, May 16, 2024 at 10:02 PM cg09 @.***> wrote:

Joe,

I think the guy was interested in reproducing some of ruben's studies, not Sachs.

On Thu, May 16, 2024 at 8:53 PM Joseph Ramsey @.***> wrote:

OK, I've cleaned it up. I removed some unnecessary options; the current parameters that work for Sachs are these: image.png (view on web) < https://github.com/cmu-phil/tetrad/assets/9853255/d27faefc-3d3b-46dc-8d5a-55311f8c77da>

I'll review the Feedback paper (it's been a while) and see if these parameters work. I believe they will.

I need to clean up some other stuff, but I can send you a version of Tetrad with these changes. As I said, I'll aim to include them in the next release.

— Reply to this email directly, view it on GitHub https://github.com/cmu-phil/tetrad/issues/1767#issuecomment-2116439826,

or unsubscribe < https://github.com/notifications/unsubscribe-auth/AD4Y3OLGY3CEYYPHK5PTLHDZCVIH7AVCNFSM6AAAAABHMZ45NKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJWGQZTSOBSGY>

. You are receiving this because you commented.Message ID: @.***>

— Reply to this email directly, view it on GitHub https://github.com/cmu-phil/tetrad/issues/1767#issuecomment-2116493666, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACLFSR3NK26WR3XAMGZLWKDZCVQKZAVCNFSM6AAAAABHMZ45NKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJWGQ4TGNRWGY . You are receiving this because you commented.Message ID: @.***>

jdramsey commented 3 months ago

@JMUB, All right, the code is nicely cleaned up now. I noticed that Ruben had written instructions for running FASK that are compatible with the Feedback paper, which I will paste below.

If you want to reproduce the results, wait until the version I've cleaned up is available before running it, as I've pulled up the original code used when we wrote that paper. I noticed that the alpha was set to 0.1 in the below, so I guessed that right. Though the FASK delta was set to -0.3, not -0.1, I guessed that one wrong.

I don't know your method for running the example in Tetrad. I can build a Tetrad snapshot for you later this morning and send it if you're using the Tetrad interface. I can set it up so you can use py-tetrad to program in Python or rpy-tetrad for R. What I can't do immediately is configure Causal Command for this--I don't handle that, so we'd have to wait until that person has time to do the update.

Here are Ruben's instructions, which I've updated slightly.

jdramsey commented 3 months ago

@JMUB I've made a snapshot build; the directory of artifacts is here:

https://s01.oss.sonatype.org/content/repositories/snapshots/io/github/cmu-phil/tetrad-gui/7.6.5-SNAPSHOT/

From this to launch the Tetrad interface with above changes you want this jar:

https://s01.oss.sonatype.org/content/repositories/snapshots/io/github/cmu-phil/tetrad-gui/7.6.5-SNAPSHOT/tetrad-gui-7.6.5-20240514.191209-1-launch.jar

I have also updated the current jar for py-tetrad if you want to run FASK from Python using JPype. If you already have py-tetrad checked out from GitHub, simply need to do a git pull to get the new jar.

JMUB commented 3 months ago

@jdramsey Thanks for all the additional information and for investing the time to figure it all out!

I removed the 2-cycle thresholding, as it was not used in the original code and was confusing. Rather, I insist that the 2-cycle test should be done, as described in the paper. It is slower, but it is what we originally published. (To verify this, I'll check the algorithm's definition in the Feedback paper.)

I tried the new snapshot you provided. The first thing I noticed is that twoCycleScreeningThreshold is still there.

I configured FASK with the parameters listed here: https://github.com/cmu-phil/tetrad/issues/1767#issuecomment-2116587286 and tried to replicate the resting state results. As before, setting twoCycleScreeningThreshold=0 fails to detect the cycles. For example, the Sanchez-Romero's paper reports a two-cycle frequency of 1 for X1-X2 (CA1-CA23DG) and X3-X4 (SUB-ERC) for the right-hemisphere resting state data (See TableC32 paper's in the supplementary material). These cycles are only detected when I set twoCycleScreeningThreshold=0.05 (see image below)

example_Sanchez-Romero_resting_state_right_hemisphere

I am using resting state right hemisphere data provided here: https://cmu.app.box.com/s/7jq6uucz3raceinnrkzf25tisy4jwe0l/folder/44147778634

I removed the 2-cycle thresholding, as it was not used in the original code and was confusing.

It seems that FASK is still using the twoCycleScreeningThreshold parameter.

Here are Ruben's instructions, which I've updated slightly.

I am a bit confused and surprised by the fact that the parameters provided differ from the ones reported in the paper, in particular, the orientation alpha (0.1 vs. 0.05) and the penalty discount (1 vs. 2).

jdramsey commented 3 months ago

One second; you should be using FASK, not FASK-old now. The two-cycle screening shouldn't be there... can you check?

On Fri, May 17, 2024 at 5:36 AM JMUB @.***> wrote:

@jdramsey https://github.com/jdramsey Thanks for all the additional information and for investing the time to figure it all out!

I removed the 2-cycle thresholding, as it was not used in the original code and was confusing. Rather, I insist that the 2-cycle test should be done, as described in the paper. It is slower, but it is what we originally published. (To verify this, I'll check the algorithm's definition in the Feedback paper.)

I tried the new snapshot you provided. The first thing I noticed is that twoCycleScreeningThreshold is still there.

I configured FASK with the parameters listed here: https://github.com/cmu-phil/tetrad/issues/1767#issuecomment-2116587286 http://url and tried to replicate the resting state results. As before, setting twoCycleScreeningThreshold=0 fails to detect the cycles. For example, the Sanchez-Romero's paper reports a two-cycle frequency of 1 for X1-X2 (CA1-CA23DG) and X3-X4 (SUB-ERC) for the right-hemisphere resting state data (See TableC32 paper's in the supplementary material). These cycles are only detected when I set twoCycleScreeningThreshold=0.05 (see image below)

example_Sanchez-Romero_resting_state_right_hemisphere.png (view on web) https://github.com/cmu-phil/tetrad/assets/38407925/53dcfa73-1d9b-487b-9ed6-02d85065acad

I am using resting state right hemisphere data provided here: https://cmu.app.box.com/s/7jq6uucz3raceinnrkzf25tisy4jwe0l/folder/44147778634 http://url

I removed the 2-cycle thresholding, as it was not used in the original code and was confusing.

It seems that FASK is still using the twoCycleScreeningThreshold parameter.

Here are Ruben's instructions, which I've updated slightly.

I am a bit confused and surprised by the fact that the parameters provided differ from the ones reported in the paper, in particular, the orientation alpha (0.1 vs. 0.05) and the penalty discount (1 vs. 2).

— Reply to this email directly, view it on GitHub https://github.com/cmu-phil/tetrad/issues/1767#issuecomment-2117148867, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACLFSR2VTT2TXB2TJ4BJJCTZCXFR3AVCNFSM6AAAAABHMZ45NKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJXGE2DQOBWG4 . You are receiving this because you were mentioned.Message ID: @.***>

jdramsey commented 3 months ago

Also, where there's a discrepancy, I would use the parameter settings from the paper.

JMUB commented 3 months ago

In the algorithm menu I don't have FASK-old. Only FASK and FASK-PW are available.

jdramsey commented 3 months ago

These are the parameters that are showing for me. Is it different from you? If so I need to think how that happened...

image
jdramsey commented 3 months ago

Let me try the data you sent.

jdramsey commented 3 months ago

That's fine; to see FASK-Old, you'd need to turn on the experimental algorithm, in the File->Setting menu, but FASK should be showing the above. Maybe I gave you the wrong link. Hold on. It was late last night, I recall. One second...

On Fri, May 17, 2024 at 11:40 AM JMUB @.***> wrote:

In the algorithm menu I don't have FASK-old. Only FASK and FASK-PW are available.

— Reply to this email directly, view it on GitHub https://github.com/cmu-phil/tetrad/issues/1767#issuecomment-2117871214, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACLFSR2R32KXXFR5V3BYTWDZCYQHTAVCNFSM6AAAAABHMZ45NKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJXHA3TCMRRGQ . You are receiving this because you were mentioned.Message ID: @.***>

jdramsey commented 3 months ago

Yes, I gave you the wrong link. Many apologies. it was very late last night. Here's the right link, that ends with -2-launch.jar:

https://s01.oss.sonatype.org/content/repositories/snapshots/io/github/cmu-phil/tetrad-gui/7.6.5-SNAPSHOT/tetrad-gui-7.6.5-20240517.042531-2-launch.jar

jdramsey commented 3 months ago

Also, for some odd reason, I can't follow your links. Let me see if I can figure out which example it is in Box...

JMUB commented 3 months ago

Also, for some odd reason, I can't follow your links. Let me see if I can figure out which example it is in Box...

This is the link to the data:

https://cmu.app.box.com/s/7jq6uucz3raceinnrkzf25tisy4jwe0l/folder/44147778634

jdramsey commented 3 months ago

I guess I'm still confused, though I can find the Box folder. (i found it separately, where I expected it.) The reason I'm confused is that the "simple networks" don't contain a 7-variable example:

image
jdramsey commented 3 months ago

For instance, I pick a concatenation of data from network 5, the first one, and run FASK on it using the -2 version I sent, default parameters. Here's it the true network:

image

Here is the FASK result, default parameters:

image

It is off by one edge. Of course:

JMUB commented 3 months ago

I guess I'm still confused, though I can find the Box folder. (i found it separately, where I expected it.) The reason I'm confused is that the "simple networks" don't contain a 7-variable example

The fMRI resting-state data used in the paper contain 7 variables. Those are the results I am trying to replicate.

jdramsey commented 3 months ago

One second...

jdramsey commented 3 months ago

OK, I see. OK, I did the first concatenation in the left_mtl_concatenated_reduced directory and get this:

image

I used the -2 Tetrad I gave the link for just above (the one dated from last night). This shows two of the 2-cycles reported in the paper.

jdramsey commented 3 months ago

I'm not sure which parameter settings to use—are these right? I'm sorry I didn't dig through the paper just now for the parameters, so I'm using the ones that you said, I think... And to be honest, I didn't do these experiments, so I'm reconstructing them myself. But this is consistent.

image
JMUB commented 3 months ago

I used the -2 Tetrad I gave the link for just above (the one dated from last night). This shows two of the 2-cycles reported in the paper.

Using this new snapshot yields consistent results for the resting state data. I still want to test which combination of alpha (0.1 or 0.05) and penalty discount (1 or 2) enables a closer replication.

Again thank you so much for the support and the quick replies.

jdramsey commented 3 months ago

Absolutely!