JuliaStats / GLM.jl

Generalized linear models in Julia
Other
584 stars 114 forks source link

Calculate F-statistics for Tests of Between-Subjects Effects (Type III, ANOVA) #508

Open PharmCat opened 1 year ago

PharmCat commented 1 year ago

This is draft PR.

This test can be used by users who want ANOVA and can't use F-test for some data (as in my case with crossover designs).

Tests of Between-Subjects Effects included in this PR as it described in SPSS:

F =β' L' (L V L')⁻¹ β L / rank(L)

image

where V - variance-covariance matrix of β

L-matrix description from SPSS docs (SPSS and SAS docs avialible in public access):

For each level combination of the between subjects factors in TABLES, identify the nonmissing
cases with positive caseweights and positive regression weights which are associated with the
current level combination. Suppose the cases are classified by three between-subjects factors:
A, B and C. Now A and B are specified in TABLES and the current level combination is A=1
and B=2. A case in the cell A=1, B=2, and C=3 is associated with the current level combination,
whereas a case in the cell A=1, B=3 and C=3 is not. Compute the average of the design matrix
rows corresponding to these cases.
If an effect contains a covariate, then its parameters which belong to the current level
combination are equal to the mean of the covariate, and are equal to 0 otherwise. Using the above
example, for effect A*X where X is a covariate, the parameter [A=1]*X belongs to the current
level combination where the parameter [A=2]*X does not. If the effect contains a product of
covariates, then the mean of the product is applied.
The result is the l vector for the current between-subjects factor level combination. When none
of the between-subjects effects contain covariates, the vector always forms an estimable function.
Otherwise, a non-estimable function may occur, depending on the data.

For the inverse of (L V L) used pinv because some of V matrices are ill-conditioned.

Because V is Symmetric - I try to calculate it in place and more efficiently.

After calculation F for each effect p values are calculated too.

ccdf(FDist(df[i], dof_residual(obj)), F[i])

where df - is rank(L)

For Intercept factor I think the general mean should be used, but because I can't get the number of efficient levels for InteractionTerm from StatsModel - it is not realized.

Also for InteractionTerm it may be some issues for zero-intercept models because I don't know how to get completed contrast matrix for this.

PrettyTables is used for printing because this is draft PR, it can be removed by custom output.

model used for check:

using  DataFrames, CSV, StatsBase, Plots, GLM, CategoricalArrays

dffinal = CSV.File("df1.csv") |> DataFrame
transform!(dffinal, :period => categorical, renamecols = false)
transform!(dffinal, :id => categorical, renamecols = false)

ols = lm(@formula(logAUC ~ sequence+period+formulation), 
dffinal, contrasts  = Dict(:period => DummyCoding(base = 4),
:sequence => DummyCoding(base = "TRRT"),
:formulation => DummyCoding(base = "T")))
GLM.typeiii(ols)

result:

 ------------- ----- ----------- -------------
         Name    DF           F          Pval 
 ------------- ----- ----------- -------------
  (Intercept)   1.0     14250.2   5.16248e-74
     sequence   1.0     3.37076     0.0712381
       period   3.0   0.0282538      0.993531
  formulation   1.0    0.527962      0.470244
 ------------- ----- ----------- -------------

SPSS result:

image

data used for check:

id,period,sequence,formulation,AUC,logAUC,Cmax,logCmax
1,1,RTTR,R,10671,9.275285060616564,817,6.705639094860003
1,4,RTTR,R,11206,9.32420462812523,1502,7.31455283232408
1,2,RTTR,T,12772,9.455010553834672,1439,7.271703706887368
1,3,RTTR,T,13151,9.484253080341965,1310,7.177782416195197
2,2,TRRT,R,6068,8.710784340468424,1372,7.22402480828583
2,3,TRRT,R,5996,8.698847859222488,1056,6.962243464266207
2,1,TRRT,T,6518,8.782322859397507,1393,7.239214973779806
2,4,TRRT,T,5844,8.67317077287059,1310,7.177782416195197
3,2,TRRT,R,5728,8.653121708640482,1377,7.227662498728654
3,3,TRRT,R,5760,8.658692753689937,1529,7.332369205929062
3,1,TRRT,T,4939,8.504918160540624,1481,7.300472814267799
3,4,TRRT,T,6313,8.750366278367625,781,6.660575149839686
4,1,RTTR,R,7588,8.934323331054905,823,6.71295620067707
4,4,RTTR,R,7654,8.942983665985647,1095,6.9985096422506015
4,2,RTTR,T,8980,9.102755161296246,1133,7.0326242610280065
4,3,RTTR,T,8408,9.036938912556787,1065,6.970730078143525
5,2,TRRT,R,8022,8.989943046329998,1035,6.942156705699469
5,3,TRRT,R,10721,9.27995971385624,1571,7.359467638255621
5,1,TRRT,T,7653,8.942853006809921,709,6.5638555265321274
5,4,TRRT,T,8043,8.99255742690407,1342,7.201916317531627
6,1,RTTR,R,8389,9.034676602846295,1347,7.205635176410364
6,4,RTTR,R,7616,8.938006576471201,1153,7.050122520269059
6,2,RTTR,T,7949,8.980801413573113,691,6.53813982376767
6,3,RTTR,T,7735,8.953510763007166,949,6.855408798609928
7,1,RTTR,R,5161,8.548885638148727,1278,7.15305163493748
7,4,RTTR,R,4764,8.46884293047519,1040,6.946975992135418
7,2,RTTR,T,6601,8.79497643168877,991,6.898714534329988
7,3,RTTR,T,5479,8.608677881538416,1124,7.024649030453636
8,2,TRRT,R,8026,8.990441550826862,1242,7.124478262493424
8,3,TRRT,R,6776,8.821142236331891,1090,6.993932975223189
8,1,TRRT,T,8864,9.089753408987065,1516,7.323830566202317
8,4,TRRT,T,6995,8.852950887099581,1048,6.954638864880987
9,1,RTTR,R,7399,8.90910013492555,1547,7.344072850573066
9,4,RTTR,R,7211,8.88336291691676,1485,7.3031700512368
9,2,RTTR,T,7873,8.971194463184467,1361,7.215975002651466
9,3,RTTR,T,8153,9.006141236662911,1380,7.22983877815125
10,1,RTTR,R,5660,8.641179171197228,1088,6.992096427415888
10,4,RTTR,R,5076,8.532278828834277,796,6.679599185844383
10,2,RTTR,T,4858,8.488382109562117,982,6.889591308354466
10,3,RTTR,T,5347,8.584290934948731,995,6.902742737158593
11,2,TRRT,R,7730,8.952864141581468,908,6.811244378601294
11,3,TRRT,R,8228,9.015298250772847,1183,7.075808863978387
11,1,TRRT,T,8503,9.048174321385792,999,6.906754778648554
11,4,TRRT,T,8032,8.99118884193151,1129,7.029087564149662
12,2,TRRT,R,6007,8.700680734850161,1220,7.106606137727303
12,3,TRRT,R,7737,8.953769294549454,776,6.654152520183219
12,1,TRRT,T,7043,8.85978949474541,679,6.520621127558696
12,4,TRRT,T,6262,8.742254901886351,1258,7.1372784372603855
13,2,TRRT,R,5767,8.659907293615413,869,6.767343125265392
13,3,TRRT,R,5942,8.689801056022553,921,6.825460036255307
13,1,TRRT,T,5701,8.648396877031582,822,6.71174039505618
13,4,TRRT,T,7757,8.956350940490871,947,6.853299093186078
14,2,TRRT,R,7858,8.969287400118406,1451,7.280008252884188
14,3,TRRT,R,7924,8.977651407818442,1389,7.236339342754344
14,1,TRRT,T,8684,9.069237530998182,615,6.421622267806518
14,4,TRRT,T,9219,9.129021850798594,1279,7.153833801578843
15,1,RTTR,R,6937,8.844624683385302,953,6.859614903654202
15,4,RTTR,R,7515,8.924656302187074,1247,7.1284959456800365
15,2,RTTR,T,7905,8.975250749643573,1065,6.970730078143525
15,3,RTTR,T,6550,8.787220328629298,830,6.721425700790643
16,1,RTTR,R,11473,9.347751727799142,1368,7.221105098182496
16,4,RTTR,R,10365,9.24619002486988,1418,7.257002707092073
16,2,RTTR,T,9698,9.179674957665297,1281,7.155396301896734
16,3,RTTR,T,10355,9.245224773829685,1083,6.9874902470009905
18,2,TRRT,R,5120,8.540909718033554,842,6.7357800142423265
18,3,TRRT,R,5420,8.597851094433691,1176,7.069874128458572
18,1,TRRT,T,5210,8.558335134747413,668,6.504288173536645

PS nested factors are not implemented because they are not implemented in StatsModels. I think SS and MS can be calculated too. The main problem - is a stable method to get real numbers of levels for InteractionTerm.

codecov-commenter commented 1 year ago

Codecov Report

Base: 88.75% // Head: 89.38% // Increases project coverage by +0.63% :tada:

Coverage data is based on head (ff47219) compared to base (b728917). Patch coverage: 97.53% of modified lines in pull request are covered.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #508 +/- ## ========================================== + Coverage 88.75% 89.38% +0.63% ========================================== Files 8 8 Lines 1040 1121 +81 ========================================== + Hits 923 1002 +79 - Misses 117 119 +2 ``` | [Impacted Files](https://codecov.io/gh/JuliaStats/GLM.jl/pull/508?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=JuliaStats) | Coverage Δ | | |---|---|---| | [src/ftest.jl](https://codecov.io/gh/JuliaStats/GLM.jl/pull/508?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=JuliaStats#diff-c3JjL2Z0ZXN0Lmps) | `98.80% <97.53%> (-1.20%)` | :arrow_down: | Help us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=JuliaStats). Have a feature suggestion? [Share it here.](https://app.codecov.io/gh/feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=JuliaStats)

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

PharmCat commented 1 year ago

add test for coverage, reference dataset 1 from Schütz, Helmut & Labes, Detlew & Fuglsang, Anders. (2014). Reference Datasets for 2-Treatment, 2-Sequence, 2-Period Bioequivalence Studies. The AAPS journal. 16. 1292-1297. 10.1208/s12248-014-9661-0.

PharmCat commented 1 year ago

Maybe lcontrast function can be simplified with using StatsModels.hypothesis_matrix, but anyway it is not working with InteractionTerm.

PharmCat commented 1 year ago

Hi! Maybe if this PR not appropriate to GLM - make it as additional package?

bkamins commented 1 year ago

I have not followed the whole discussion, but I think we should not create a separate package. Rather it would be better to think of the best package to place it into.