Enhanced Anatomy plots - Githubissues

andkov commented 7 years ago

The investigation into the effect size / significance of the association between two processes (quantified through covariance and correlation between terms of the bivariate growth curve model) can benefit from several modification to the existing "anatomy" plots:

mapping levels of predictors onto the plot of factor scores. This will allow to check whether the reason why the estimated correlation is insignificant while the trend in the observed scatter of factor score is clearly evident.
inspecting the dynamic of the graphs (both spaghetti and scatters) in the context of a - ae - aeh - aehplus progression (incremental addition of predictors)

Spaghetti

eas-1

Scatters

eas-2

@ampiccinin

wibeasley commented 7 years ago

@andkov, I like those graphs, and all the context/stat info you packed in --without trampling on the data patterns.

Instead of using a continuous range for color. If you end up publishing these, consider cutting BAGE into ~5 categories (eg, (a) 0-4, (b) 5-9, (c) 10-14, (d) 15-19, (e) 20-24).
I'd fade the top right text into a gray
I'd using a softer color than red, ideally one the complements the lavender. Maybe turquoise?
I'd experiment with plotting the text below/before the points, especially since you're plotting them with a nice transparency alpha.

andkov commented 7 years ago

@wibeasley , thanks for the input. In reaction to your suggestion I've split categories into 5 quantiles, which provides additional information about the distribution of this variable. I've implemented all but 4, because i would have needed to restructure complex graph assembly. But good point. @ampiccinin , many things have come together today, so i'm happy to show you the plots that will help us diagnose the role of predictors. Please let me know if you can see how these can be improved to address what would YOU like to see about the models that could help you understand what's going on. I will color the trajectories in a similar fashion. What else would it be necessary to see for better diagnostics of these models? Note: these are six identical plots, coloring each of the six covariates.

1-age 2-edu 3-height 4-smoke 5-cardio 6-diabetes

ampiccinin commented 7 years ago

@andkov : these are really great!! Can you create these for one of the situations where the slope-slope correlation is high? in this example the correlation is -0.04, with p value 0.925.

andkov commented 7 years ago

@ampiccinin , yes, here's one for b1-male-aehplus-fev-boston_im 1-age 2-edu 3-height 4-smoke 5-cardio 6-diabetes

andkov commented 7 years ago

@ampiccinin , here's the same model with scaled fev .
1-age 2-edu 3-height 4-smoke 5-cardio 6-diabetes

We begin to run into the difficulty of too many graphs, the anatomy reports would be very large and might loose the reader. I'm thinking about the solution. So for right now, if you have a specific model you'd like to examine like this - just let me know. I'll think of something over the weekend, to organize these images into accessible form.

ampiccinin commented 7 years ago

@andkov - great example. Maybe perfect.

See how the estimated correlation is 0.8? Yet the graph suggests zero or negative correlation?

This suggests that the correlations do in fact represent the partial correlation, which they should.

If you plot this different age quintiles separately, we should see more clearly that the one that included age 70 (wherever baseline age is centred) looks more like r= +0.80. Can you do this (just for this one example is fine)?

In fact, the estimated correlation should be for the 70-year-old male non-smoker of average education and height and no diagnosed cardio or diabetes issues. I'm betting that the reason they often are not statistically "significant" is because the n for this subgroup is so small compared to its variability. ...Ok, now I'm feeling a bit déjà vu. At any rate, it is not unlikely a power issue.

I'm keen to see the separate plots for the age groups!!

andkov commented 7 years ago

Here is the one that included age 70.

1-age 2-edu 3-height 4-smoke 5-cardio 6-diabetes

ampiccinin commented 7 years ago

Cool. what do you see (in particular for age and smoking=no)? is it difficult to go one more step and make a single plot that contains only the age group you have plotted here, but restricting also on the other variables to where their reference value is? We don't need this for all studies. It would just be good to check a couple, in particular of the correlation/SE/p values that seemed odd and not matching. Just to satisfy ourselves that we actually understand what is going on so we can explain it in the paper.

Since the phys-cog papers will not have a lot of space, it might actually be better to make this point in the phys-phys paper (and possibly reference it in the phys-cog), but since you have the graphs running for phys-cog, we can examine the problem there, since it should be the same principle.

wibeasley commented 7 years ago

Since the phys-cog papers will not have a lot of space, it might actually be better to make this point in the phys-phys paper

Two other options are (a) online supplemental material and (b) hosting them on an IALSA site (similar as this report is contained on a project-specific website).

andkov commented 7 years ago

@ampiccinin, ok, let me work on these modification.

ampiccinin commented 7 years ago

@andkov @wibeasley At this point I am thinking of the graphs as serving our own understanding of what the models are telling us, particularly when we see odd combinations (high non-significant correlations). I suppose we could imagine including one as an example in a paper, but for me this is just internal consumption. I don't think most people will be interested in sifting through these.

andkov commented 7 years ago

is it difficult to go one more step and make a single plot that contains only the age group you have plotted here, but restricting also on the other variables to where their reference value is?

Technically, this is not difficult. However, there are not enough data points to find such a restrictive combination. To illustrate, when I further restrict this group of 65 individuals

ds <- ds %>% 
  dplyr::filter(
    edu == 7
    ,height > 169.5 & height < 170.5
    ,smoke == "no"
    ,diabetes == "no"
    ,cardio == "no"
    )

I simply run out of data points : i find no such combination. Or did I misunderstood your question?

andkov commented 7 years ago

I don't think most people will be interested in sifting through these.

I'm with you, @ampiccinin . Telling the story is not the same as finding the story. I am fully cognizant that most of the graphs and reports produced are not going to end up in a publication, but it's ok, because it's aimed at a different goal - to empower the writers to say what would be interesting to read. It's quite tricky, as I'm finding out, to switch from the scientist mind to a journalist mind and i'm finding more and more appreciation for this knack.

ampiccinin commented 7 years ago

:) exactly - We can't just select 1 value for these, just like with age.

How about selecting 165-180 (or 170-175) for height and 7-12 on education (remind me where it is centered? Is it really 7 years?).

with respect to scientist vs journalist - I was thinking of it more as exploratory vs confirmatory, or observation vs interpretation.

wibeasley commented 7 years ago

I was thinking of it more as exploratory vs confirmatory, or observation vs interpretation.

We frequently called them "internal reports" and "external reports" to make a similar distinction.

Another related characteristic is development time. An internal report takes ~20 minutes to develop. An external report takes hours, because all the defaults are usually tweaked in order to make a big impact quickly.

andkov commented 7 years ago

How about selecting 165-180 (or 170-175) for height and 7-12 on education (remind me where it is centered? Is it really 7 years?).

@ampiccinin Yes the center for education is at 7, for height it's 160 for females and 172 for males.

Unfortunately, even as i'm expanding the ranges there aren't even people. Let me try to find a similar case for females, there are more of them in the set.

ampiccinin commented 7 years ago

if we can't find enough people, even taking a wide swath, Then I'd say this is part of the problem with the models.

Another option is to just not select based on education and height. That will be close enough

andkov commented 7 years ago

The contingency tables are pretty sparse. When I take the initial 321 individuals (even before stratifying on age group at baseline) there are just not enough people in the refence category (11 here)

> ds %>% 
+   dplyr::group_by(smoke,cardio,diabetes) %>% 
+   dplyr::distinct(id) %>% 
+   dplyr::count()
Source: local data frame [8 x 4]
Groups: smoke, cardio [?]

   smoke cardio diabetes     n
  <fctr> <fctr>   <fctr> <int>
1    yes    yes      yes    99
2    yes    yes       no    35
3    yes     no      yes    22
4    yes     no       no     6
5     no    yes      yes    84
6     no    yes       no    30
7     no     no      yes    34
8     no     no       no    11

And when I add age_group_bl it's even sparser. Do you see anything that may work?

   age_group_bl  smoke cardio diabetes     n
         <fctr> <fctr> <fctr>   <fctr> <int>
1   [85.4,98.5]    yes    yes      yes    23
2   [85.4,98.5]    yes    yes       no     2
3   [85.4,98.5]    yes     no      yes     8
4   [85.4,98.5]    yes     no       no     2
5   [85.4,98.5]     no    yes      yes    14
6   [85.4,98.5]     no    yes       no     6
7   [85.4,98.5]     no     no      yes     8
8   [85.4,98.5]     no     no       no     1
9   [81.9,85.4)    yes    yes      yes    25
10  [81.9,85.4)    yes    yes       no     6
11  [81.9,85.4)    yes     no      yes     4
12  [81.9,85.4)     no    yes      yes    12
13  [81.9,85.4)     no    yes       no     6
14  [81.9,85.4)     no     no      yes     9
15  [81.9,85.4)     no     no       no     2
16  [78.5,81.9)    yes    yes      yes    21
17  [78.5,81.9)    yes    yes       no    12
18  [78.5,81.9)    yes     no      yes     3
19  [78.5,81.9)    yes     no       no     2
20  [78.5,81.9)     no    yes      yes    12
21  [78.5,81.9)     no    yes       no     8
22  [78.5,81.9)     no     no      yes     4
23  [78.5,81.9)     no     no       no     2
24  [73.4,78.5)    yes    yes      yes    14
25  [73.4,78.5)    yes    yes       no     6
26  [73.4,78.5)    yes     no      yes     4
27  [73.4,78.5)    yes     no       no     2
28  [73.4,78.5)     no    yes      yes    22
29  [73.4,78.5)     no    yes       no     5
30  [73.4,78.5)     no     no      yes     8
31  [73.4,78.5)     no     no       no     3
32  [57.9,73.4)    yes    yes      yes    16
33  [57.9,73.4)    yes    yes       no     9
34  [57.9,73.4)    yes     no      yes     3
35  [57.9,73.4)     no    yes      yes    24
36  [57.9,73.4)     no    yes       no     5
37  [57.9,73.4)     no     no      yes     5
38  [57.9,73.4)     no     no       no     3

ampiccinin commented 7 years ago

...it's the last one, with n=3. You could try age quartile instead of quintile. I am not at all surprised. This is what I've been asking about from the start.

IALSA / IALSA-2015-Portland

Enhanced Anatomy plots #159

Spaghetti

Scatters