gavinsimpson / gratia

ggplot-based graphics and useful functions for GAMs fitted using the mgcv package
https://gavinsimpson.github.io/gratia/
Other
206 stars 28 forks source link

`draw` forgets order of ordered factor #284

Closed mhpob closed 6 months ago

mhpob commented 6 months ago

Everything evaluates fine, but the terms are not ordered after parametric_effects and plotted in alphabetical order.

The factor and its ordering look to become purposefully uncoupled into row order, an "ordered" type, and an unordered character representation of the levels in parametric_effects. The row ordering is not passed on to the base plotting step of draw_parametric_effect.

> library(mgcv)
Loading required package: nlme
This is mgcv 1.9-1. For overview type 'help("mgcv-package")'.
> library(gratia)
> 
> df <- data_sim("eg1", seed = 42)
> df$month <- factor(
+   rep(month.abb[1:10], times = 40),
+   levels = month.abb[1:10],
+   ordered = T
+ )
> 
> class(df$month)
[1] "ordered" "factor" 
> levels(df$month)
 [1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct"
> 
> m <- gam(y ~ month + s(x0) + s(x1) + s(x2) + s(x3), data = df, method = "REML")
> summary(m)

Family: gaussian 
Link function: identity 

Formula:
y ~ month + s(x0) + s(x1) + s(x2) + s(x3)

Parametric coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  7.49514    0.10571  70.901   <2e-16 ***
month.L     -0.10830    0.33948  -0.319    0.750    
month.Q      0.37636    0.34039   1.106    0.270    
month.C      0.42859    0.33954   1.262    0.208    
month^4     -0.17750    0.34422  -0.516    0.606    
month^5      0.33674    0.34115   0.987    0.324    
month^6      0.12809    0.34837   0.368    0.713    
month^7     -0.07840    0.33997  -0.231    0.818    
month^8     -0.09706    0.33962  -0.286    0.775    
month^9      0.03407    0.34168   0.100    0.921    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Approximate significance of smooth terms:
        edf Ref.df      F  p-value    
s(x0) 3.358  4.165  8.438 1.73e-06 ***
s(x1) 3.112  3.871 66.289  < 2e-16 ***
s(x2) 7.889  8.676 66.775  < 2e-16 ***
s(x3) 1.902  2.382  2.810    0.054 .  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

R-sq.(adj) =  0.682   Deviance explained = 70.2%
-REML = 886.14  Scale est. = 4.47      n = 400
> 
> parametric_effects(m, terms ='month')
# A tibble: 10 × 5
   term  type    .level .partial   .se
   <chr> <chr>   <chr>     <dbl> <dbl>
 1 month ordered Jan    -0.0591  0.322
 2 month ordered Feb     0.352   0.326
 3 month ordered Mar     0.251   0.325
 4 month ordered Apr    -0.00511 0.324
 5 month ordered May    -0.312   0.322
 6 month ordered Jun    -0.200   0.329
 7 month ordered Jul    -0.0499  0.324
 8 month ordered Aug    -0.176   0.326
 9 month ordered Sep    -0.159   0.321
10 month ordered Oct     0.359   0.322
> 
> parametric_effects(m, terms ='month') |> 
+   draw()

image

gavinsimpson commented 6 months ago

Yeah; this is a known infelicity of the current way I return everything as a single tibble; we can't merge factors (ordered or otherwise) with different levels and wouldn't want to anyway.

I think I really want to return a nested tibble with 1 row per effect (1 row for each smooth or parametric effect), which will allow for storage of the levels as we won't need to convert them to character to bind all the effects together. But that would be quite a change to the UI and would fail as soon as someone unnested the tibble...

Right now, the best I can think of is to attach another attribute the carries the factor levels in a list, and make sure I preserve that attribute through any subsetting operations.

gavinsimpson commented 6 months ago

This is now fixed in the devel version on GitHub. I'll make a release to CRAN shortly after June 10th.

issue-284-plot