Closed mestaki closed 3 years ago
Oh the embarassment of not seeing this until now.... So sorry, Bod! @mestaki
Yes, this is absolutely possible and actually pretty easy. Here's an example of how you can do it
library(tidyverse)
library(breakaway)
set.seed(1)
example_richness <- rnorm(20, mean = 1000, sd = 100)
example_ses <- rnorm(20, mean = 100, sd = 15)
tb <- tibble("cs" = example_richness,
"ses" = example_ses,
"time" = rep(c(0,1), each = 10),
"disease" = rep(c(0,1), 10))
betta(chats = example_richness,
ses = example_ses,
X = model.matrix(~time*disease, data = tb))$table
Let me know if you have any questions! and sorry again for the delay eeeeek! (The summer was a total wash for software development)
Hi @adw96, No worries at all, appreciate you getting around to it eventually though! The dataset that made me ask this question is already published now, but I am still very curious about this, so allow me to expand, and I apologize in advance if this is more related to understanding linear models in general vs betta specifically.
So here's a case example building from yours:
library(tidyverse)
set.seed(10)
example_richness <- c(rnorm(8, mean = 100, sd = 40), #healthy, week1
rnorm(8, mean = 100, sd = 40), #disease, week1
rnorm(8, mean = 150, sd = 40), #healthy, week2 (increased)
rnorm(8, mean = 100, sd = 40)) #disease, week2 (no increase)
example_ses <- c(rnorm(8, mean = 100, sd = 15), #all the same
rnorm(8, mean = 100, sd = 15),
rnorm(8, mean = 100, sd = 15),
rnorm(8, mean = 100, sd = 15))
tb <- tibble("subject"=rep(letters[seq(1:16)], 2),
"cs" = example_richness,
"ses" = example_ses,
"time" = rep(c("week1","week2"), each = 16),
"disease" = rep(c("healthy", "disease"), each = 8, len = 32))
tb %>% head
# A tibble: 6 x 5
subject cs ses time disease
<chr> <dbl> <dbl> <chr> <chr>
1 a 143. 75.9 week1 healthy
2 b 149. 113. week1 healthy
3 c 129. 102. week1 healthy
4 d 80.8 118. week1 healthy
5 e 123. 88.6 week1 healthy
6 f 50.1 106. week1 healthy
So in a typical lm model what you suggested would be equal to:
(mod1 <- lm(cs~time*disease, data=tb) %>% anova)
Analysis of Variance Table
Response: cs
Df Sum Sq Mean Sq F value Pr(>F)
time 1 12853 12852.9 8.8727 0.0059208 **
disease 1 21160 21160.2 14.6075 0.0006759 ***
time:disease 1 8343 8342.7 5.7592 0.0232980 *
Residuals 28 40560 1448.6
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Which takes into account both the effect of treatment
, time
, and their interactions. But we could also frame the question in another way by looking only at the difference in cs
between time points:
tb2 <- tb %>%
pivot_wider(., id_cols=c(subject,disease), names_from=time, values_from=c("cs","ses")) %>%
mutate(delta_cs = cs_week2-cs_week1) #calculating difference between week 2 and week 1
tb2 %>% head
# A tibble: 6 x 7
subject disease cs_week1 cs_week2 ses_week1 ses_week2 delta_cs
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 a healthy 143. 174. 75.9 93.0 31.1
2 b healthy 149. 141. 113. 95.3 -7.55
3 c healthy 129. 179. 102. 114. 49.1
4 d healthy 80.8 179. 118. 101. 97.9
5 e healthy 123. 168. 88.6 116. 45.1
6 f healthy 50.1 156. 106. 111. 106.
(mod2 <- lm(delta_cs ~ disease, data=tb2) %>% anova)
Analysis of Variance Table
Response: delta_cs
Df Sum Sq Mean Sq F value Pr(>F)
disease 1 16685 16685.4 4.0947 0.06254 .
Residuals 14 57049 4074.9
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Now, I get that paired tests do actually work very similarly to this in that the difference in time is calculated but do they answer the same question exactly? Because I see the interpretations of results between these two approaches could almost be opposite, right? In mod1
we would interpret that there was an effect of time, disease, and their interactions. But in mod2
we would conclude that there was no change in cs
between groups across time.
So, going back to betta, it is easy enough to implement mod1
-as you mentioned- with :
(betta1 <- breakaway::betta(chats = example_richness,
ses = example_ses,
X = model.matrix(~time*disease, data = tb))$table)
But, if I wanted to go the mod2
approach, is there a way to get a delta_ses
? I imagine just subtracting ses_time2
from ses_time1
wouldn't really make sense, right? This is where my question stems from, hope this makes sense!
Hi Bod! Good question! In general, for two random variables X1 and X2, the variance in (X1-X2) is Var(X1) + Var(X2) - 2 * Cov(X1, X2) In the case where you are taking the difference between two estimators, you can get reasonable estimates of VarX1 and VarX2, but the challenge is in estimating the covariance. Unfortunately this calculation is going to be complicated in most scenarios, because if your data is autocorrelated then it's likely to induce correlation in the estimators. Since here your estimators of species richness take a complicated form, this is going to be hard to do!
If you really want to go about this, you could make the assumption that the covariance in the estimators is positively correlated (which I think is very likely), and then you can construct an upper bound on the variance. Depending on precisely what claims you want to make, I would think you could have a hypothesis testing procedure that controls Type 1 error, even if it is conservative.
I'm sure you already know about this, but I usually recommend betta_random
for folks who have longitudinal data.
I hope that helps! I'm going to close this issue since it is not a breakaway
, but will do my best to answer more questions if you have them.
Cheers,
Amy
Thanks for taking the time to explain this, even though it was mainly outside of betta's domain. I think I understand your approach, and as you said that does sound a lot more complicated than its worth, given that a much simpler approach is right in front of us. Also, I tried re-running the code I posted with a few other random seeds and it seems like I may have just gotten unlucky with that particular random subset, because in almost all the other variations the results of the two models perfectly agree! So I feel better about that now too. No further questions for now :)
Hi breakway team,
I have a question regarding extending the use of
betta
example data: 2 groups of patients (healthy, disease), that have been sampled at 2 time points (1 and 2) I am wondering if its possible to:
a) test whether a change in richness has occurred in each group separately compared to 0. So, delta richness=richness at time point 2 - richness at time point 1; and compare each groups' average to 0 a.k.a no change.
b) Whether the magnitude of change is different between groups, so again using delta richness values but this time compare across groups instead of zero.
Currently I just use the estimates and run tests on them without accounting for their errors.
Is there a sensible way of doing this using betta so that the errors are utilized?
Thanks again for this awesome package!