Closed conig closed 1 year ago
Due to the protocol, this means we need to retain the variable in both states. The most straightforward way to achieve this was to create two datasets, one with the transformed variable and one without, and then run each of those objects through the entire pipeline.
I'm struggling a bit to see the rationale for this, but maybe I'm just overlooking something? In my head the simplest way to do this is just to append log transformed versions of the variables to the same dataset. The model builder then just returns two versions - a log transformed and a non-transformed. But, happy to be convinced that there's something that I'm missing.
I'll have a look through the changes, but I'm a bit worried about a couple of things:
tar_target(models, make_models(model_params), pattern = map(model_params)
, and then you only have a single node on the graph. (The reason I didn't make this change before is that the names of the targets was being used in the multiverse doc, and dynamic branches are randomly named, but we could work around that).These changes are all super impressive, I'm just really worried about how complicated this all is.
I agree that we don't want it to be any more complicated than it has to be. I started by trying to append the transformed variable to the dataset and then adding a model to the model_list. But there are eight models where PA_volume features, meaning that we would need to greatly increase the length of model_list. You then have to hardcode what goes where throughout the pipeline to make two variants of all the figures and tables. Then you need extra code to pull both into supplementary materials. On top of that, if we are asked by reviewers to transform another variable, you're back to square one and the code grows in complexity again.
For those reasons I don't think we want to try and integrate the two variants with the previous system. However, I agree that my hack job of producing targets is not needed if there are more elegant ways to do it. I also agree there's no need for two versions of the manuscript/supps, there's better ways to handle that. In an ideal world we would set what models are going to find themself in the manuscript and the rest would automatically appear in supps. I think let's discuss how we want to go forward.
Been thinking about this a bit more. I can't think of a reason why all of the models need to be run with both pa_volume and log(pa_volume). We should be deciding on what we think the correct approach is (log(pa_volume) based on the kurtosis values) and then just include a single model list which tests what would have happened to the primary analysis if we hadn't log transformed.
That's what we did the with the fixed effect model - it only gets a single model list (moderated by age). We didn't test it on every moderator, so I don't think we need to change tactics for the transformation on one variable either.
Is there any reason to test every model with both version of the variable?
Been thinking about this a bit more. I can't think of a reason why all of the models need to be run with both pa_volume and log(pa_volume). We should be deciding on what we think the correct approach is (log(pa_volume) based on the kurtosis values) and then just include a single model list which tests what would have happened to the primary analysis if we hadn't log transformed.
That's what we did the with the fixed effect model - it only gets a single model list (moderated by age). We didn't test it on every moderator, so I don't think we need to change tactics for the transformation on one variable either.
Is there any reason to test every model with both version of the variable?
I agree that it is perhaps overkill to run every variant of every model that we have put into supplementary materails. I was trying to strictly adhere to this from the protocol:
To increase transparency and check the robustness of our estimates, we will use a multiverse analytical approach. This involves using different reasonable data processing choices (e.g., different variable transformation approaches) to construct multiple datasets. Then, we will analyse each dataset in this data multiverse separately, leading to the multiverse of statistical results (Steegen et al 2016)
But I can see that if we can justify an approach then there is no need to include models which violate assumptions etc in supplementary materials. If you're around we can briefly discuss on Tuesday then implement the changes we want.
This PR ended up being more complex than I had hoped. We decided to log-transform the PA-volume in an attempt to improve the model's residual distributions (issue #74). Due to the protocol, this means we need to retain the variable in both states. The most straightforward way to achieve this was to create two datasets, one with the transformed variable and one without, and then run each of those objects through the entire pipeline.
To accommodate this change, I had to carry out a fairly significant restructuring. On the bright side, the pipeline is now much more concise, and we can easily transform other variables without further complications.
The kurtosis values are much less extreme, although they are still all negative, which is worth looking into.
Overview
or
scale_.`.