Closed rpsychologist closed 1 year ago
Great! (Why rcs()
instead of something in mgcv?)
Great! (Why
rcs()
instead of something in mgcv?)
More familiar with it :) Placing the knots at quantiles seems appropriate for the PWS data. But I can try a GAMM as well
Yeah here we can (I think) put knots at each hour, because there should be one well-being question every hour IIRC. I also thought it'd be important to study how many people there are who have sessions with durations <=1h, <=2h, etc. I imagine this falls to a very low % at even moderately long sessions, and so then your point about the shape of the curve being determined by relatively few people with long sessions is worrisome.
Yeah here we can (I think) put knots at each hour
I think we can put them closer during the first 1 or 2 hours. Since the prompts were randomized wasn't WB effectively measured every ~10 minutes overall?
it'd be important to study how many people there are who have sessions with durations <=1h, <=2h, etc
yes
and so then your point about the shape of the curve being determined by relatively few people with long sessions is worrisome.
We can plot the shape and add a confidence ribbon + data to show that confidence should be much lower in those regions.
Another potential challenge is that the observation times are informative, e.g. although the prompts were randomized within sessions the actual session length and the number of prompts is likely non-ignorable. So we have a mix of MCAR and MNAR data, which would be a problem in an experiment, but perhaps this isn't a problem here as we're doing a more descriptive analysis.
I started a draft of this model in https://github.com/digital-wellbeing/pws-prepost/blob/continuous-time/continuous-time-draft.qmd
I think the overall shape of change makes sense using rcs(hours, 5)
(bottom panel).
Love it, and the bottom shape does not entirely surprise me.
Re write-up: Do you agree that it still makes sense to use the simple model as the main thing? We can anticipate this as RQ2 or something, and criticizing the main model and moving to this might help explain what is going on. I think it would help with explaining the results.
Which one is the simple model? :smile: The post model? I'm not yet sure which model I prefer.
I think the continuous time model should be expanded to also include something equivalent to spline(week) * pre_mood
. I would like to know if change is different for those who reported lower WB at the start of a session. But we can't include pre_mood as a covariate as most values are missing by design. So I'm not sure how to model it. Maybe the easiest way is to interpret the correlated random effects. But I don't think lme4 can handle that many random effects with this kind of sample size.
The pre-post model is now just a three level mixed model. Because so many participants have such a small N the scale part of the location-scale model is hard to identify. So I vote calling that one "simple"!
Irrespective of which one we prefer, I think it makes narrative sense (not in the hack your hypothesis way :) ) to start with a pre-post, and then iterate on the detail of the RQ and model.
Does your current model next sessions within participants (are the session codes uniquely coded per participant)?
fit1 <- lmer(mood ~ rcs(hours, 5) + (1 | session) + (1 | pid), data = dat)
- GAMM?
Only if it improves over rcs()
. I'd just never heard of the latter.
- Is censoring really needed?
Theoretically, yes (looking at how many observations are at 1). In practice, most likely not.
- Visualize the % that have positive vs negative change over a session? or over 30 / 60 minutes.
This should be in (the data in the) histogram in what is currently in the MS.
All the small points you make in the code about the inadequacy of the contrast etc are really interesting. I'll look into them in more detail too.
Some other comments:
This is because we only include in the pre-post analysis sessions that have a "real" pre-measure, i.e. a mood measure at login, not just any mood measure that happens to be the first. Or is there some other issue?
I'd be happy to either ignore this or do something like y ~ post*which_hour_of_session ...
.
Does your current model next sessions within participants (are the session codes uniquely coded per participant)?
I'm using correct session IDs now :sweat_smile: Thanks!
- GAMM?
Only if it improves over
rcs()
. I'd just never heard of the latter.
rcs
is the same as spline::ns
, e.g. a piece-wise cubic spline.
- Visualize the % that have positive vs negative change over a session? or over 30 / 60 minutes.
This should be in (the data in the) histogram in what is currently in the MS.
Yes, but I want something similar from the CT model. There's a YOLO attempt in the draft https://github.com/digital-wellbeing/pws-prepost/blob/38df8661c98016eb1af3562e9daed532e769015f/continuous-time-draft.qmd#L517-L523
This is because we only include in the pre-post analysis sessions that have a "real" pre-measure, i.e. a mood measure at login, not just any mood measure that happens to be the first. Or is there some other issue?
No, this is more a note that the CT model wouldn't need to discard such responses.
I tried adding random effects for each part of the spline. https://github.com/digital-wellbeing/pws-prepost/blob/38df8661c98016eb1af3562e9daed532e769015f/continuous-time-draft.qmd#L374-L377
and plotted the spline at different values of the random intercept https://github.com/digital-wellbeing/pws-prepost/blob/38df8661c98016eb1af3562e9daed532e769015f/continuous-time-draft.qmd#L383-L434
but the correlation between the random intercept and slops is pretty weak, so the shape doesn't change much
Looking at the model predictions for a random participant, things look pretty reasonable tho
I'll start thinking about some secondary analyses.
A simple idea for an analysis using continuous time could be a simple 3-level model with observations nested within sessions and participants.
maybe with random slopes, location-scale, and other fancy stuff used in the primary analysis.