Replicate results from Sanchez-Romero et al. (2019) with the FASK algorithm

JMUB commented 6 months ago

I have been trying to replicate the results reported in the paper by Sanchez-Romero et al. (2019) "Estimating feedforward and feedback effective connections from fMRI time series: Assessments of statistical methods" https://direct.mit.edu/netn/article/3/2/274/2211/Estimating-feedforward-and-feedback-effective

In particular, I want to replicate the results obtained for the left-hemisphere resting-state fMRI using the FASK algorithm (Fig. 6 in the paper). I am using the data provided in the supplementary material available here: https://cmu.app.box.com/s/7jq6uucz3raceinnrkzf25tisy4jwe0l/folder/44148019136 .

The paper states that the results were obtained setting the following parameters:

a penalty discount of c = 1
threshold for 2-cycle detection of α = 0.05

However, the current FASK implementation has two additional parameters: faskDelta and twoCycleScreeningThreshold.

The paper does not mention the values for these two parameters. When I run FASK using the default values faskDelta= 0 and twoCycleScreeningThreshold= 0 , the algorithm fails to identify the 2-cycles reported in the paper.

I noticed that the detection of cycles is sensitive to the value given to twoCycleScreeningThreshold. Unfortunately, the supplementary material of the paper makes no mention to this parameter.

Does anybody know the value for these parameters necessary to replicate the results? Or does anybody has guidelines to set these parameters?

I am using the Tetrad java executable version 7.6.3. Below you can see the complete FASK parameters in am currently using.

Thanks in advance for any help.

FASK_parameters

jdramsey commented 6 months ago

Someone else had asked about this, and Ruben Sanchez and I had figured out the answer. Let me go back and find that answer and send it to you.

cg09 commented 6 months ago

Send it to me as well.

On Wed, May 8, 2024 at 8:16 AM Joseph Ramsey @.***> wrote:

Someone else had asked about this, and Ruben Sanchez and I had figured out the answer. Let me go back and find that answer and send it to you.

— Reply to this email directly, view it on GitHub https://github.com/cmu-phil/tetrad/issues/1767#issuecomment-2100445381, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD4Y3ONCGVVP5IQY2MNV7Y3ZBIJTVAVCNFSM6AAAAABHMZ45NKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMBQGQ2DKMZYGE . You are receiving this because you are subscribed to this thread.Message ID: @.***>

JMUB commented 6 months ago

I found related information in the causal-cmd repository, where somebody tried to replicate the results of the macaque data: https://github.com/bd2kccd/causal-cmd/issues/69#issuecomment-1439165483

There it is recommended to set twoCycleScreeningThreshold to 0. It is also mentioned that this parameter was not used in the original paper implementation.

When I set twoCycleScreeningThreshold = 0, FASK fails to detect the 2-cycles in the the resting state data. I only obtain similar edge frequencies and 2-cycles to those reported in the paper by setting twoCycleScreeningThreshold = 0.05. This value was just a lucky guess.

Is there a set of parameters that yield an exact replicate of the results?

jdramsey commented 6 months ago

Oh, I'm so sorry! This slipped my mind in all the end-of-semester goings-on.

Let me work on this a bit over the next several days. Please don't let me forget. The issue is that the Tetrad suite has developed quite a bit since that paper. But I should be able to look back in the repository to that point in time and see where the relevant code has diverged.

jdramsey commented 6 months ago

OK, I'm going to step away from my current project for a few hours and do this. I think I know how to go about it. We wrote a tech report analyzing the Sachs data here:

https://arxiv.org/abs/1805.03108

I need to figure out how to reproduce those results, and then I think I'll have answered your question.

Sorry, it's been a very long time since I've worked on FASK; I realize it's my algorithm, but I need to get my head back in that game.

jdramsey commented 6 months ago

I fooled around with this for a while this afternoon and couldn't figure it out, so I went back into the repository to find the version of FASK that we used for the Sachs report and added it to my branch as "FASK-Orig" as an experimental algorithm. Sure enough, the analysis is replicated.

This is the graph I just got:

This is the published:

Take a look at these graphs and see if they look identical to you. What I did was look at the date of publication of the Sachs report and go into the repository to find the version of FASK from that date.

I have to go home for dinner soon, but I need to look at the date of publication of the feedback paper you were referencing to see if that version of FASK is different from this. I don't think it is.

I have some work to do to fix the current FASK, which I'll probably do tomorrow. It's gotten out of sync a little bit in a way that's not obvious to my "naked eye," and the old one is better for Sachs.

If this works out, what I can do is send you a snapshot build of Tetrad with the repairs to try, though if I make the fixes, I'd rather just include them in the next published version.

jdramsey commented 6 months ago

Hmm... there is actually one edge difference between the two graphs, which I'm not going to worry about just now.

jdramsey commented 6 months ago

Looking at the dates, the Sachs report is from May 2018, while the Feedback report is from February 2019. We worked on the Sachs report until we published it on arXiv, whereas the Feedbacks paper had to undergo a lengthy review, so I believe the versions of FASK are identical. Let me dive back into the repository. The next commit on FASK after May 2018 was April 2019, so we have a winner, this version!

OK, so that's settled. Alright, either overnight or tomorrow I will reconcile the versions.

jdramsey commented 6 months ago

OK, I've cleaned it up. I removed some unnecessary options; the current parameters that work for Sachs are these:

I'll review the Feedback paper (it's been a while) and see if these parameters work. I believe they will.

I need to clean up some other stuff, but I can send you a version of Tetrad with these changes. As I said, I'll aim to include them in the next release.

cg09 commented 6 months ago

Joe,

I think the guy was interested in reproducing some of ruben's studies, not Sachs.

On Thu, May 16, 2024 at 8:53 PM Joseph Ramsey @.***> wrote:

OK, I've cleaned it up. I removed some unnecessary options; the current parameters that work for Sachs are these: image.png (view on web) https://github.com/cmu-phil/tetrad/assets/9853255/d27faefc-3d3b-46dc-8d5a-55311f8c77da

I'll review the Feedback paper (it's been a while) and see if these parameters work. I believe they will.

I need to clean up some other stuff, but I can send you a version of Tetrad with these changes. As I said, I'll aim to include them in the next release.

— Reply to this email directly, view it on GitHub https://github.com/cmu-phil/tetrad/issues/1767#issuecomment-2116439826, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD4Y3OLGY3CEYYPHK5PTLHDZCVIH7AVCNFSM6AAAAABHMZ45NKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJWGQZTSOBSGY . You are receiving this because you commented.Message ID: @.***>

jdramsey commented 6 months ago

There is a method to my madness, Clark. Sachs is an example from the same period that I know the specific answer to. Once this one works, the results in that paper should be reproducible, as they use the same algorithm. I retrieved the specific code from the repository for FASK that was used in both of those papers.

I've taken advantage of the fact that we now have better programming tools (with AIs) that let me clean up the code. Let me elaborate on the issue with the parameters.

The original code from that period used the FAS-Stable algorithm, so I have hard-coded that. The original code has an external graph option, which I've kept.
There was an "empirical" option, as you recall, for Hyvarinen and Smith's algorithms, but it was always supposed to be set to "on," so I've hard-coded that and removed the parameter from the interface. It was not used for the FASK rules.
I removed the 2-cycle thresholding, as it was not used in the original code and was confusing. Rather, I insist that the 2-cycle test should be done, as described in the paper. It is slower, but it is what we originally published. (To verify this, I'll check the algorithm's definition in the Feedback paper.)
The only difference is that an alpha level of 0.01 recovers the previous two cycles correctly instead of 1e-5. I don't know how to explain that yet. I'll look into it.

OK?

On Thu, May 16, 2024 at 10:02 PM cg09 @.***> wrote:

Joe,

I think the guy was interested in reproducing some of ruben's studies, not Sachs.

On Thu, May 16, 2024 at 8:53 PM Joseph Ramsey @.***> wrote:

OK, I've cleaned it up. I removed some unnecessary options; the current parameters that work for Sachs are these: image.png (view on web) < https://github.com/cmu-phil/tetrad/assets/9853255/d27faefc-3d3b-46dc-8d5a-55311f8c77da>

I'll review the Feedback paper (it's been a while) and see if these parameters work. I believe they will.

I need to clean up some other stuff, but I can send you a version of Tetrad with these changes. As I said, I'll aim to include them in the next release.

— Reply to this email directly, view it on GitHub https://github.com/cmu-phil/tetrad/issues/1767#issuecomment-2116439826,

or unsubscribe < https://github.com/notifications/unsubscribe-auth/AD4Y3OLGY3CEYYPHK5PTLHDZCVIH7AVCNFSM6AAAAABHMZ45NKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJWGQZTSOBSGY>

. You are receiving this because you commented.Message ID: @.***>

— Reply to this email directly, view it on GitHub https://github.com/cmu-phil/tetrad/issues/1767#issuecomment-2116493666, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACLFSR3NK26WR3XAMGZLWKDZCVQKZAVCNFSM6AAAAABHMZ45NKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJWGQ4TGNRWGY . You are receiving this because you commented.Message ID: @.***>

jdramsey commented 6 months ago

@JMUB, All right, the code is nicely cleaned up now. I noticed that Ruben had written instructions for running FASK that are compatible with the Feedback paper, which I will paste below.

If you want to reproduce the results, wait until the version I've cleaned up is available before running it, as I've pulled up the original code used when we wrote that paper. I noticed that the alpha was set to 0.1 in the below, so I guessed that right. Though the FASK delta was set to -0.3, not -0.1, I guessed that one wrong.

I don't know your method for running the example in Tetrad. I can build a Tetrad snapshot for you later this morning and send it if you're using the Tetrad interface. I can set it up so you can use py-tetrad to program in Python or rpy-tetrad for R. What I can't do immediately is configure Causal Command for this--I don't handle that, so we'd have to wait until that person has time to do the update.

Here are Ruben's instructions, which I've updated slightly.

public final class Fask
Implements the FASK (Fast Adjacency Skewness) algorithm, which makes decisions for adjacency and orientation using a combination of conditional independence testing, judgments of nonlinear adjacency, and pairwise orientation due to non-Gaussianity. The reference is this:
Sanchez-Romero, R., Ramsey, J. D., Zhang, K., Glymour, M. R., Huang, B., and Glymour, C. (2019). Estimating feedforward and feedback effective connections from fMRI time series: Assessments of statistical methods. Network Neuroscience, 3(2), 274-30
Some adjustments have been made in some ways from that version, and some additional pairwise options have been added from this reference:
Hyvärinen, A., and Smith, S. M. (2013). Pairwise likelihood ratios for estimation of non-Gaussian structural equation models. Journal of Machine Learning Research, 14(Jan), 111-152.
This method (and the Hyvarinen and Smith methods) make the assumption that the data are generated by a linear, non-Gaussian causal process and attempts to recover the causal graph for that process. They do not attempt to recover the parametrization of this graph; for this a separate estimation algorithm would be needed, such as linear regression regressing each node onto its parents. A further assumption is made, that there are no latent common causes of the algorithm. This is not a constraint on the pairwise orientation methods, since they orient with respect only to the two variables at the endpoints of an edge and so are happy with all other variables being considered latent with respect to that single edge. However, if the built-in adjacency search is used (FAS-Stable), the existence of latents will throw this method off.
As was shown in the Hyvarinen and Smith paper above, FASK works quite well even if the graph contains feedback loops in most configurations, including 2-cycles. 2-cycles can be detected fairly well if the FASK left-right rule is selected and the 2-cycle threshold set to about 0.1--more will be detected (or hallucinated) if the threshold is set higher. As shown in the Sanchez-Romero reference above, 2-cycle detection of the FASK algorithm using this rule is quite good.
Some edges may be undiscoverable by FAS-Stable; to recover more of these edges, a test related to the FASK left-right rule is used, and there is a threshold for this test. A good default for this threshold (the "skew edge threshold") is 0.3. For more of these edges, set this threshold to a lower number.
It is assumed that the data are arranged so each variable forms a column and that there are no missing values. The data matrix is assumed to be rectangular. To this end, the Tetrad DataSet class is used, which enforces this.
Note that orienting a DAG for a linear, non-Gaussian model using the Hyvarinen and Smith pairwise rules is alternatively known in the literature as Pairwise LiNGAM--see Hyvärinen, A., and Smith, S. M. (2013). Pairwise likelihood ratios for estimation of non-Gaussian structural equation models. Journal of Machine Learning Research, 14(Jan), 111-152. We include some of these methods here for comparison.
Parameters:
depth: -1. # control the size of the conditional set in the independence tests, setting this to a small integer may reduce the running time, but can also result in false positives. -1 means that it will check "all" possible sizes.
score: sem-bic-score
semBicRule: 1 # to set the Chickering Rule, used in the original Fask
penaltyDiscount: 2 # if using sem-bic as independence test (as in the paper). In the paper this is referred as c. Check step 1 and 10 in Algorithm 2 FAS stable.
skewEdgeThreshold: 0.3 # See description of Fask algorithm, and step 11 in Algorithm 1 FASK. Threshold to add edges that may have been non-inferred because there was a positive/ negative cycle that result in a non-zero observed relation.
faskLeftRightRule: 1 # this run FASK v1, the original FASK from the paper
faskDelta: -0.3 # See step 1 and 11 in Algorithm 4 (this is the value set in the paper)
orientationAlpha: 0.1 # this was referred in the paper as TwoCycle Alpha or just alpha, the lower it is, the lower the chance of inferring a two cycle. Check steps 17 to 28 in Algorithm 3: 2 Cycle Detection Rule.
structurePrior: 0 # prior on the number of parents. Not used in the paper implementation.
So a run of command line would look like this:
java -jar -Xmx10G causal-cmd-1.4.1-jar-with-dependencies. jar --delimiter tab --data-type continuous --dataset concat_BOLDfslfilter_60_FullMacaque. txt --prefix Fask_Test_MacaqueFull --algorithm fask --faskAdjacencyMethod 1 --depth -1 --test sem-bic-test --score sem-bic-score --semBicRule 1 --penaltyDiscount 2 --skewEdgeThreshold 0.3 --faskLeftRightRule 1 --faskDelta -0.3--orientationAlpha 0.1 -structurePrior 0
This class is configured to respect knowledge of forbidden and required edges, including knowledge of temporal tiers.
This code was cleaned up and rendered compatible with the original implementation on 5-16-2024. jdramsey

jdramsey commented 6 months ago

@JMUB I've made a snapshot build; the directory of artifacts is here:

https://s01.oss.sonatype.org/content/repositories/snapshots/io/github/cmu-phil/tetrad-gui/7.6.5-SNAPSHOT/

From this to launch the Tetrad interface with above changes you want this jar:

https://s01.oss.sonatype.org/content/repositories/snapshots/io/github/cmu-phil/tetrad-gui/7.6.5-SNAPSHOT/tetrad-gui-7.6.5-20240514.191209-1-launch.jar

I have also updated the current jar for py-tetrad if you want to run FASK from Python using JPype. If you already have py-tetrad checked out from GitHub, simply need to do a git pull to get the new jar.

JMUB commented 6 months ago

@jdramsey Thanks for all the additional information and for investing the time to figure it all out!

I removed the 2-cycle thresholding, as it was not used in the original code and was confusing. Rather, I insist that the 2-cycle test should be done, as described in the paper. It is slower, but it is what we originally published. (To verify this, I'll check the algorithm's definition in the Feedback paper.)

I tried the new snapshot you provided. The first thing I noticed is that twoCycleScreeningThreshold is still there.

I configured FASK with the parameters listed here: https://github.com/cmu-phil/tetrad/issues/1767#issuecomment-2116587286 and tried to replicate the resting state results. As before, setting twoCycleScreeningThreshold=0 fails to detect the cycles. For example, the Sanchez-Romero's paper reports a two-cycle frequency of 1 for X1-X2 (CA1-CA23DG) and X3-X4 (SUB-ERC) for the right-hemisphere resting state data (See TableC32 paper's in the supplementary material). These cycles are only detected when I set twoCycleScreeningThreshold=0.05 (see image below)

example_Sanchez-Romero_resting_state_right_hemisphere

I am using resting state right hemisphere data provided here: https://cmu.app.box.com/s/7jq6uucz3raceinnrkzf25tisy4jwe0l/folder/44147778634

I removed the 2-cycle thresholding, as it was not used in the original code and was confusing.

It seems that FASK is still using the twoCycleScreeningThreshold parameter.

Here are Ruben's instructions, which I've updated slightly.

I am a bit confused and surprised by the fact that the parameters provided differ from the ones reported in the paper, in particular, the orientation alpha (0.1 vs. 0.05) and the penalty discount (1 vs. 2).

jdramsey commented 6 months ago

One second; you should be using FASK, not FASK-old now. The two-cycle screening shouldn't be there... can you check?

On Fri, May 17, 2024 at 5:36 AM JMUB @.***> wrote:

@jdramsey https://github.com/jdramsey Thanks for all the additional information and for investing the time to figure it all out!

I removed the 2-cycle thresholding, as it was not used in the original code and was confusing. Rather, I insist that the 2-cycle test should be done, as described in the paper. It is slower, but it is what we originally published. (To verify this, I'll check the algorithm's definition in the Feedback paper.)

I tried the new snapshot you provided. The first thing I noticed is that twoCycleScreeningThreshold is still there.

I configured FASK with the parameters listed here: https://github.com/cmu-phil/tetrad/issues/1767#issuecomment-2116587286 http://url and tried to replicate the resting state results. As before, setting twoCycleScreeningThreshold=0 fails to detect the cycles. For example, the Sanchez-Romero's paper reports a two-cycle frequency of 1 for X1-X2 (CA1-CA23DG) and X3-X4 (SUB-ERC) for the right-hemisphere resting state data (See TableC32 paper's in the supplementary material). These cycles are only detected when I set twoCycleScreeningThreshold=0.05 (see image below)

example_Sanchez-Romero_resting_state_right_hemisphere.png (view on web) https://github.com/cmu-phil/tetrad/assets/38407925/53dcfa73-1d9b-487b-9ed6-02d85065acad

I am using resting state right hemisphere data provided here: https://cmu.app.box.com/s/7jq6uucz3raceinnrkzf25tisy4jwe0l/folder/44147778634 http://url

I removed the 2-cycle thresholding, as it was not used in the original code and was confusing.

It seems that FASK is still using the twoCycleScreeningThreshold parameter.

Here are Ruben's instructions, which I've updated slightly.

I am a bit confused and surprised by the fact that the parameters provided differ from the ones reported in the paper, in particular, the orientation alpha (0.1 vs. 0.05) and the penalty discount (1 vs. 2).

— Reply to this email directly, view it on GitHub https://github.com/cmu-phil/tetrad/issues/1767#issuecomment-2117148867, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACLFSR2VTT2TXB2TJ4BJJCTZCXFR3AVCNFSM6AAAAABHMZ45NKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJXGE2DQOBWG4 . You are receiving this because you were mentioned.Message ID: @.***>

jdramsey commented 6 months ago

Also, where there's a discrepancy, I would use the parameter settings from the paper.

JMUB commented 6 months ago

In the algorithm menu I don't have FASK-old. Only FASK and FASK-PW are available.

jdramsey commented 6 months ago

These are the parameters that are showing for me. Is it different from you? If so I need to think how that happened...

jdramsey commented 6 months ago

Let me try the data you sent.

jdramsey commented 6 months ago

That's fine; to see FASK-Old, you'd need to turn on the experimental algorithm, in the File->Setting menu, but FASK should be showing the above. Maybe I gave you the wrong link. Hold on. It was late last night, I recall. One second...

On Fri, May 17, 2024 at 11:40 AM JMUB @.***> wrote:

In the algorithm menu I don't have FASK-old. Only FASK and FASK-PW are available.

— Reply to this email directly, view it on GitHub https://github.com/cmu-phil/tetrad/issues/1767#issuecomment-2117871214, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACLFSR2R32KXXFR5V3BYTWDZCYQHTAVCNFSM6AAAAABHMZ45NKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJXHA3TCMRRGQ . You are receiving this because you were mentioned.Message ID: @.***>

jdramsey commented 6 months ago

Yes, I gave you the wrong link. Many apologies. it was very late last night. Here's the right link, that ends with -2-launch.jar:

https://s01.oss.sonatype.org/content/repositories/snapshots/io/github/cmu-phil/tetrad-gui/7.6.5-SNAPSHOT/tetrad-gui-7.6.5-20240517.042531-2-launch.jar

jdramsey commented 6 months ago

Also, for some odd reason, I can't follow your links. Let me see if I can figure out which example it is in Box...

JMUB commented 6 months ago

Also, for some odd reason, I can't follow your links. Let me see if I can figure out which example it is in Box...

This is the link to the data:

https://cmu.app.box.com/s/7jq6uucz3raceinnrkzf25tisy4jwe0l/folder/44147778634

jdramsey commented 6 months ago

I guess I'm still confused, though I can find the Box folder. (i found it separately, where I expected it.) The reason I'm confused is that the "simple networks" don't contain a 7-variable example:

jdramsey commented 6 months ago

For instance, I pick a concatenation of data from network 5, the first one, and run FASK on it using the -2 version I sent, default parameters. Here's it the true network:

Here is the FASK result, default parameters:

It is off by one edge. Of course:

The results reported are aggregate results over all of the concatenations.
As you say, the parameters reported in the paper may be slightly different.

JMUB commented 6 months ago

I guess I'm still confused, though I can find the Box folder. (i found it separately, where I expected it.) The reason I'm confused is that the "simple networks" don't contain a 7-variable example

The fMRI resting-state data used in the paper contain 7 variables. Those are the results I am trying to replicate.

jdramsey commented 6 months ago

One second...

jdramsey commented 6 months ago

OK, I see. OK, I did the first concatenation in the left_mtl_concatenated_reduced directory and get this:

I used the -2 Tetrad I gave the link for just above (the one dated from last night). This shows two of the 2-cycles reported in the paper.

jdramsey commented 6 months ago

I'm not sure which parameter settings to use—are these right? I'm sorry I didn't dig through the paper just now for the parameters, so I'm using the ones that you said, I think... And to be honest, I didn't do these experiments, so I'm reconstructing them myself. But this is consistent.

JMUB commented 6 months ago

I used the -2 Tetrad I gave the link for just above (the one dated from last night). This shows two of the 2-cycles reported in the paper.

Using this new snapshot yields consistent results for the resting state data. I still want to test which combination of alpha (0.1 or 0.05) and penalty discount (1 or 2) enables a closer replication.

Again thank you so much for the support and the quick replies.

jdramsey commented 6 months ago

Absolutely!

cmu-phil / tetrad

Replicate results from Sanchez-Romero et al. (2019) with the FASK algorithm #1767