AlexiaJM / LEGIT

An R package for the Latent Environmental & Genetic InTeraction (LEGIT) model
GNU General Public License v3.0
12 stars 2 forks source link

problem of using the GxE_interaction_test #4

Closed ljhburn closed 3 years ago

ljhburn commented 3 years ago

Hi, thank you for developing such a useful pacakge. I was trying to follow the example(https://cran.r-project.org/web/packages/LEGIT/vignettes/GxE_testing.html). The example ran well ex_dia = example_with_crossover(N=250, c=0, coef_main = c(3,1,2), sigma=1, seed=7)

GxE_test_BIC = GxE_interaction_test(data=ex_dia$data, genes=ex_dia$G, env=ex_dia$E, formula_noGxE = y ~ 1, crossover = c("min","max"), criterion="BIC")

however, I couldn't use my own data for a GxE testibng. I tried to make my own list including any variabels requied, but it didn't work.

mylist1 <- list("data"=data,"G"=G,"E"=E,coef_G=1,coef_E=1,coef_main = c(3,1,2),c=0) GxE_test_BIC = GxE_interaction_test(data=mylist1$data, genes = mylist1$G, env = mylist1$E, formula_noGxE = y ~ G*E+Age+Edu+Race, crossover = c("min", "max"),criterion = "BIC") GxE_test_BIC$results

for it comes up with the error:

Error in str2lang(x) : :2:0: unexpected end of input 1: y ~ G * E + Age + Edu + Race + ^

Would you tell me any solution about this? Example with real data might be helpful (instead of those generated by example_with_crossover and example_2way)

AlexiaJM commented 3 years ago

Hi @ljhburn,

This must be because you include the GxE term in the formula. Only for the GxE_interaction_test, you must not include the GxE term. I even renamed the "formula" argument to "formula_noGxE" just to be more direct since not everyone reads the documentation. The reason for this is because I manually add the G, E, or G*E terms to test the different models.

To fix the bug, change it to "formula_noGxE = y ~ Age+Edu+Race".

Alexia

ljhburn commented 3 years ago

Thanks for your help! @AlexiaJM I read the document and rewrited the formular code, but it didn't work. I wonder it might due to the data structure. My data has a y and several and some covariates (data), following only one gene (G) and one environment variable (E). My data doesn't include the _yture , which is different from the example, and I don't know how to get this _yture. Also, does the outcome can only appare in the form of "y" or it can keep its original name?
Thanks again !

AlexiaJM commented 3 years ago

If you G and E are only one variable, you still need them as data.frame rather than as single variables. So you can do GxE_interaction_test(data=mylist1$data, genes = as.data.frame(mylist1$G), env = as.data.frame(mylist1$E), formula_noGxE = y ~ Age+Edu+Race, crossover = c("min", "max"),criterion = "BIC")

You could also make your data as a data.frame if its not (but it probably is). Let me know if that works. Ignore the y_true.

AlexiaJM commented 3 years ago

Actually, that will not give the right name, change it to: GxE_interaction_test(data=mylist1$data, genes = data.frame(G=mylist1$G), env = data.frame(E=mylist1$E), formula_noGxE = y ~ Age+Edu+Race, crossover = c("min", "max"),criterion = "BIC")

ljhburn commented 3 years ago

Thanks a lot! adding data.frame can help. Next, I would test 6 models by changing the c and βE. Can I just follow the example :

c and βE depend on the model assumed. c=0 represents vantage sensitivity, c=10 represents diathesis-stress. We set c=5 to represents differential susceptibility. βE=0 assumes a STRONG model while βE≠0 assumes a WEAK model.

updated : I found the results of testing different c and βE are almost the same, probably due to the simple model. Also, when I add GxCov or ExCov in the formula, the model without G*E (i.e., only G, only E, both G and E, neither G and E) have the same BIC (though they are not best fitted), is that to proper enough for verifying false positives? Here is the result: LEGIT-GxE

AlexiaJM commented 3 years ago

If E is always between 0 and 1, then this is correct.

You have to be extra careful, you cannot add G*Race. Because in the formula, it will put G+Race+G:Race. So it adds a G effect everywhere! So your "E only" model, for example, still actually contains G. One way around this is to write "G*Edu + G*Race + G*Edu - G" and likewise "E*age + E*Edu + E*Race - E" in your formula. This will ensure that there is no G or E included.

And yes, its good practice to add GxCov or ExCov; however, it reduces your power significantly. Some think that you should do it, some don't, but always keep your sample size in mind.

Yiyi-Wu commented 3 years ago

Hello @AlexiaJM , Thank you for developing such a useful R package. After reading the description about LEGIT model, I was confused about “The LEGIT model is an interaction model with two latent variables”. If the environmental variable is observed variable and the genetic variable is latent variable, or the environmental variable and the genetic variable are both observed variables, does that mean I couldn't use this package to distinguish differential susceptibility from diathesis-stress and vantage sensitivity?

AlexiaJM commented 3 years ago

Hi @Yiyi-Wu,

It still work even if there is only one element in G or E. I just implemented everything in the LEGIT framework, but it works for non-LEGIT models (when G is a single gene and E is a single environment). You'll find that the functions provide more information than other SAS/SPSS packages. You can also plot the GxE and test rGEs.

As seen above, just do something like this: "genes = data.frame(G=data$my_gene), env = data.frame(E=data$my_env)" in the GxE testing function.

Yiyi-Wu commented 3 years ago

Hi @AlexiaJM, Thank you for your help! Up to now, I have only used SPSS and Mplus. I will learn how to use this R package in the following days, though it may be quite challenging for me because I have never used R before. Thanks again!