donaldRwilliams / BGGM

Bayesian Gaussian Graphical Models
https://donaldrwilliams.github.io/BGGM/
GNU General Public License v2.0
54 stars 14 forks source link

Error in 'confirm' function related to naming #55

Closed mikaelr88 closed 3 years ago

mikaelr88 commented 4 years ago

I'm running into the error below:

#this is the code I'm running:
hyp_act_dep_neg <- c("alcohol--PHQ.8_1 > 0")
confirm_intra_inter <- confirm(Y = act_network_dep,
                               hypothesis = hyp_act_dep_neg,
                               iter = 50000,
                               type="mixed")
confirm_intra_inter
Error in parse_hypothesis(names_coef, hypothesis) : 
  Some of the parameters referred to in the 'hypothesis' do not correspond to parameter names of object 'x'.
  The following parameter names in the 'hypothesis' did not match any parameters in 'x': alcohol, PHQ81
  The parameters in object 'x' are named: onetwo, onethree, twothree, onefour, twofour, threefour, onefive, twofive, threefive, fourfive, onesix, twosix, threesix, foursix, fivesix, oneseven, twoseven, threeseven, fourseven, fiveseven, sixseven, oneeight, twoeight, threeeight, foureight, fiveeight, sixeight, seveneight, onenine, twonine, threenine, fournine, fivenine, sixnine, sevennine, eightnine, oneten, twoten, threeten, fourten, fiveten, sixten, seventen, eightten, nineten, oneeleven, twoeleven, threeeleven, foureleven, fiveeleven, sixeleven, seveneleven, eighteleven, nineeleven, teneleven, onetwelve, twotwelve, threetwelve, fourtwelve, fivetwelve, sixtwelve, seventwelve, eighttwelve, ninetwelve, tentwelve, eleventwelve, onethirteen, twothirteen, threethirteen, fou

It doesn't seem to have anything to do with the node names (which is what I thought at first - but I changed all the node names and no difference). It runs fine if I just use the 'onetwo' 'onethree' names eg:

hyp_act_dep_neg <- c("(onetwo--onethree, fiveeight--onetwo) > 0;
                     (onetwo--onethree, fiveeight--onetwo) = 0")

And I've run the syntax from the tutorial paper - also fine. I tried to find where parse_hypothesis lives to track down where the 'parameter names of object 'x'' are but I haven't been able to find it sorting through the code in github.

donaldRwilliams commented 4 years ago

The code below does not produce an error for me (note I used the CRAN version of the package).

Y <- BGGM::ptsd[,1:3]

colnames(Y)[1] <- "alcohol"

colnames(Y)[2] <- "PHQ.8_1"

hyp_act_dep_neg <- c("alcohol--PHQ.8_1 > 0")

fit <- confirm(Y, hypothesis = hyp_act_dep_neg,  type = "mixed", iter = 500)

But from looking at the error, I noted the names it is searching do not match, i.e., in the error message the . and _ has been removed from PHQ.8_1. Honestly not sure why that would be, as it does not happen for me.

mikaelr88 commented 4 years ago

Okay I ran some other dataframes with some but not all of the columns from the dataframe that's throwing the error and neither of them is throwing an error. So there must be something particularly strange about that one dataframe. I have no idea what it is honestly - sorry for the hassle.

donaldRwilliams commented 4 years ago

It is no hassle. When I have more time, I will try to find the error :-)

Thank you for opening the issue.

mikaelr88 commented 3 years ago

I appreciate it! I have a little more to add, which just highlights how idiosyncratic this error seems to be: I'm running 5 confirmatory analyses to get at 5 different hypotheses:

E1_h1_neg <- c("
(end_phone--knit,  end_tv--knit) > 0;
end_phone--knit > end_tv--knit > 0;
end_phone--knit = end_tv--knit > 0;
(end_phone--knit, end_tv--knit) = 0")
E1_h2_neg <- c("
(end_exercise--videogames, end_exercise--alcohol) > 0;
(end_exercise--videogames, end_exercise--alcohol) = 0")
E1_h1_pos <- c("
(end_phone--social_media, end_phone--home_exercise) > 0
end_phone--social_media > end_phone--home_exercise > 0;
end_phone--social_media = end_phone--home_exercise > 0;
(end_phone--social_media, end_phone--home_exercise) = 0")
E1_h2_pos <- c("
(end_drink--other_substances, end_drink--videogames) > 0;
(end_drink--other_substances, end_drink--videogames) = 0")
E1_h3_pos <- c("
(work_outside_home--volunteer, work_outside_home--knit) > 0;
(work_outside_home--volunteer, work_outside_home--knit) = 0")

confirm_E1_h1n <- BGGM::confirm(confirm_E1, E1_h1_neg, iter = 50000, type="mixed", seed=1)
confirm_E1_h2n <- BGGM::confirm(confirm_E1, E1_h2_neg, iter = 50000, type="mixed", seed=1)
confirm_E1_h1p <- BGGM::confirm(confirm_E1, E1_h1_pos, iter = 50000, type="mixed", seed=1)
confirm_E1_h2p <- BGGM::confirm(confirm_E1, E1_h2_pos, iter = 50000, type="mixed", seed=1)
confirm_E1_h3p <- BGGM::confirm(confirm_E1, E1_h3_pos, iter = 50000, type="mixed", seed=1)

The first two run fine, no errors! But the latter three throw the same error as the original post.

Happy to provide a link to the data if that would be helpful at all. I'm scratching my head at this and still can't figure out how to parse what the onetwo, onethree etc. are referring to, otherwise I'd just rename the columns and figure it out posthoc -- but I don't even know what to rename what to.

donaldRwilliams commented 3 years ago

Hi: I am excited to see confirm being used. I will look into this error, but not for a couple days (have some other things with looming deadlines).

The parsing of the hypotheses is quite a pain, given all the different things that are possible.

But for now you can try to name the columns by number. So say you have 20 variables, just name them 1:20. Then for the hypotheses use the respective numbers. So for the relation between column one and two, it would be 1--2, etc. So to test if the relation between column 1 and 2 is the same as 1 and 4, it would be 1--2 = 1--4.

No doubt using the names is much easier, but the numbers I think will work without a problem.

Please let me know if that works. If not, then I can take a look at the data.

donaldRwilliams commented 3 years ago

Actually I think it would be good to share your data. Then I can reproduce your error exactly and then fix it more quickly :-)

mikaelr88 commented 3 years ago

Great, thank you! No major rush (we pre-registered the confirmation of some exploratory networks so we're just excited to see what's what).

Here's a link to a box folder with the data I referenced above: https://utexas.box.com/s/r3l8ayd67c3cycooi14y2dhjns1p1p0g

Also tried labeling them with integers (or renaming them into character vectors of the numbers), even if the columns are named 'one'...'thirty' I'm still getting errors.

donaldRwilliams commented 3 years ago

I think I fixed it. It was a minor issue having to do with removing spacing.

You will have to install with the following

remotes::install_github("donaldRwilliams/BGGM", ref = "issue-55")

Note also I think one of your hypotheses does not make sense the way it is written, i.e.,

E1_h1_pos <- c("
(end_phone--social_media, end_phone--home_exercise) > 0
end_phone--social_media > end_phone--home_exercise > 0;
end_phone--social_media = end_phone--home_exercise > 0;
(end_phone--social_media, end_phone--home_exercise) = 0")

Note the first hypothesis is

(end_phone--social_media, end_phone--home_exercise) > 0
end_phone--social_media > end_phone--home_exercise > 0;

which I think should throw an error because the same relation is used twice in the same hypothesis. I think it might just be missing the ";".

But in general you could not have the same relation more than once in a given hypothesis. But of course you can have the same relation in each of, say, 4 hypotheses (like in your other examples).

Please let me know if that works for you (and if so close the issue please) !

PS: I am also really excited to hear that you pre registered some hypotheses to test in networks.

mikaelr88 commented 3 years ago

Unfortunately, it's throwing the exact same error (I had some issues with the remote install so it took me a little while to sort that, but the install seemed fine when I got it to work).

Thanks for catching that typo (it's meant to be 4 hypotheses).

-- I think your work on BGGM is so great, thank you for developing this package! It's such an awesome tool for developing and testing hypotheses in network analysis.

donaldRwilliams commented 3 years ago

Hmm. Not really sure how that would be, as it works for me. I can check again.

donaldRwilliams commented 3 years ago

I just ran E1_h1_pos

And here is the output:

Call:
BGGM::confirm(Y = confirm_E1, hypothesis = E1_h1_pos, type = "mixed", 
    iter = 50, seed = 1)
--- 
Hypotheses: 

H1: (end_phone--social_media,end_phone--home_exercise)>0
H2: end_phone--social_media>end_phone--home_exercise>0
H3: end_phone--social_media=end_phone--home_exercise>0
H4: (end_phone--social_media,end_phone--home_exercise)=0
H5: complement
--- 
Posterior prob: 

p(H1|data) = 0.313
p(H2|data) = 0.618
p(H3|data) = 0.027
p(H4|data) = 0.003
p(H5|data) = 0.038
--- 
Bayes factor matrix: 
      H1    H2     H3      H4     H5
H1 1.000 0.506 11.403  93.238  8.227
H2 1.977 1.000 22.538 184.293 16.260
H3 0.088 0.044  1.000   8.177  0.721
H4 0.011 0.005  0.122   1.000  0.088
donaldRwilliams commented 3 years ago

I just checked the others and for some reason E1_h3_pos is the only one that does not run for me. I will figure out why and make the fix

donaldRwilliams commented 3 years ago

I figured out the issue on that last hypothesis. Although it is somewhat convoluted, the order of the variables matters. So they have to correspond to the upper triangle of the matrix. With numbers, say, 1--2 works but 2--1 would not work. So then you have to do the same thing with the names..

so change that hypothesis to

E1_h3_pos <- c("
               (volunteer--work_outside_home, knit--work_outside_home) > 0;
                 (volunteer--work_outside_home, knit--work_outside_home) = 0;
               ")

This is mentioned in the documentation

Details
The hypotheses can be written either with the respective column names or numbers. For example, 1--2 denotes the relation between the variables in column 1 and 2. Note that these must correspond to the upper triangular elements of the correlation matrix. This is accomplished by ensuring that the first number is smaller than the second number. This also applies when using column names (i.e,, in reference to the column number).

I will try to make that clearer. But I ran the above hypothesis and it works.

Glad you like BGGM !

mikaelr88 commented 3 years ago

Thank you so much, I hadn't thought of the order of the names, sorry!

donaldRwilliams commented 3 years ago

Thank you so much, I hadn't thought of the order of the names, sorry!

All good. The order mattering is somewhat confusing and perhaps I will change that in the future.